Signal processing of multi-channel data

ABSTRACT

An approach for providing non-commutative approaches to signal processing. Quaternions are used to represent multi-dimensional data (e.g., three- and four-dimensional data). Additionally, a linear predictive coding scheme (e.g., based on the Levinson algorithm) that can be applied to wide class of signals in which the autocorrelation matrices are not invertible and in which the underlying arithmetic is not commutative. That is, the linear predictive coding scheme multi-channel can handle singular autocorrelations, both in the commutative and non-commutative cases. This approach also utilizes random path modules to replace the statistical basis of linear prediction.

FIELD OF THE INVENTION

The present invention relates to signal processing, and is more particularly related to linear prediction.

BACKGROUND OF THE INVENTION

Signals can represent information from any source that generates data, relating to electromagnetic energy to stock prices. Analysis of these signals is the focus of signal processing theory and practice. Linear prediction is an important signal processing technique that provides a number of capabilities: (1) prediction of the future of a signal from its past; (2) extraction of important features of a signal; and (3) compression of signals. The economic value of linear prediction is incalculable as its prevalence in industry is enormous.

It is observed that many important signals are “multi-channel” in that the signals are gathered from many independent sources; e.g., time series. For example, multi-channel data stem from the process of searching for oil, which requires measuring the earth at many locations simultaneously. Also, measuring the motions of walking (i.e., gait) requires simultaneously capturing the positions of many joints. Further, in a video system, a video signal is a recording of the color of every pixel on the screen at the same moment; essentially each pixel is essentially a separate “channel” of information. Linear prediction can be applied to all of the above disparate applications.

Conventional linear prediction techniques have been inadequate in the treatment of multi-channel time series, particularly, when the dimensionality is in the order is above three. There are traditional approaches of linear prediction for multi-channel signals, but are not effective in addressing the technical difficulties that are caused by the interactions of the sources of data. In single source signals, such as like voice, these difficulties are not encountered. The conventional techniques assume that the autocorrelation matrix of the data is invertible or can be made invertible by simple methods, which is rarely valid for real multi-channel data.

Also, such traditional approaches do not use the structural information available through modeling multi-dimensional geometry in a more sophisticated manner than merely as arrays of numbers. In addition, these approaches fail to take into account the phenomenon of time warping, which, for example, is critical to successful modeling of biometric time series. Further, conventional linear prediction techniques are based on a statistical foundation for linear prediction, which is not well suited for motion, video and other types of multi-channel data.

Further, it is recognized that most real multi-channel data are highly correlated. Under the conventional approaches, the popular linear prediction algorithm, known as the Levinson algorithm, cannot be applied to highly correlated channels.

Therefore, there is a need to provide a framework for extending applicability of linear prediction techniques. Additionally, there is a need for an approach to predict/compress/encrypt multi-channel multi-dimensional time series, particularly series with high correlation.

SUMMARY OF THE INVENTION

These and other needs are addressed by the present invention in which non-commutative approaches to signal processing are provided. In one embodiment, quaternions are used to represent multi-dimensional data (e.g., three- and four-dimensional data, etc.). Additionally, an embodiment of the present invention provides a linear predictive coding scheme (e.g., based on the Levinson algorithm) that can be applied to a wide class of signals in which the autocorrelation matrices are not invertible and in which the underlying arithmetic is not commutative. That is, the linear predictive coding scheme can handle singular autocorrelations, both in the commutative and non-commutative cases. Random path modules are utilized to replace the statistical basis of linear prediction. The present invention, according to one embodiment, advantageously provides an effective approach for linearly predicting multi-channel data that is highly correlated. The approach also has the advantage of solving the problem of time-warping.

In one aspect of the present invention, a method for providing linear prediction is disclosed. The method includes collecting multi-channel data from a plurality of independent sources, and representing the multi-channel data as vectors of quaternions. The method also includes generating an autocorrelation matrix corresponding to the quaternions. The method further includes outputting linear prediction coefficients based upon the autocorrelation matrix, wherein the linear prediction coefficients represent a compression of the collected multi-channel data.

In another aspect of the present invention, a method for supporting video compression is disclosed. The method includes collecting time series video signals as multi-channel data, wherein the multi-channel data is represented as vectors of quaternions. The method also includes generating an autocorrelation matrix corresponding to the quaternions, and outputting linear prediction coefficients based upon the autocorrelation matrix.

In another aspect of the present invention, a method of signal processing is provided. The method includes receiving multi-channel data, representing multi-channel data as vectors of quaternions, and performing linear prediction based on the quaternions.

In another aspect of the present invention, a method of performing linear prediction is provided. The method includes representing multi-channel data as a pseudo-invertible matrix, generating a pseudo-inverse of the matrix, and outputting a plurality of linear prediction weight values and associated residual values based on the generating step.

In another aspect of the present invention, a computer-readable medium carrying one or more sequences of one or more instructions for performing signal processing is disclosed. The one or more sequences of one or more instructions include instructions which, when executed by one or more processors, cause the one or more processors to perform the steps of receiving multi-channel data, representing multi-channel data as vectors of quaternions, and performing linear prediction based on the quaternions.

In yet another aspect of the present invention, a computer-readable medium carrying one or more sequences of one or more instructions for performing signal processing is disclosed. The one or more sequences of one or more instructions include instructions which, when executed by one or more processors, cause the one or more processors to perform the steps of representing multi-channel data as a pseudo-invertible matrix, generating a pseudo-inverse of the matrix, and outputting a plurality of linear prediction weight values and associated residual values based on the generating step.

Still other aspects, features, and advantages of the present invention are readily apparent from the following detailed description, simply by illustrating a number of particular embodiments and implementations, including the best mode contemplated for carrying out the present invention. The present invention is also capable of other and different embodiments, and its several details can be modified in various obvious respects, all without departing from the spirit and scope of the present invention. Accordingly, the drawing and description are to be regarded as illustrative in nature, and not as restrictive.

BRIEF DESCRIPTION OF THE DRAWINGS

The present invention is illustrated by way of example, and not by way of limitation, in the figures of the accompanying drawings and in which like reference numerals refer to similar elements and in which:

FIG. 1 is a diagram of a system for providing non-commutative linear prediction, according to an embodiment of the present invention;

FIGS. 2A and 2B are diagrams of multi-channel data capable of being processed by the system of FIG. 1;

FIG. 3 is a flow chart of a process for representing multi-channel data as quaternions, according to an embodiment of the present invention;

FIG. 4 is a flowchart of the operation for performing non-commutative linear prediction in the system of FIG. 1; and

FIG. 5 is a diagram of a computer system that can be used to implement an embodiment of the present invention.

DESCRIPTION OF THE PREFERRED EMBODIMENT

A system, method, and software for processing multi-channel data by non-commutative linear prediction are described. In the following description, for the purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the present invention. It is apparent, however, to one skilled in the art that the present invention may be practiced without these specific details or with an equivalent arrangement. In other instances, well-known structures and devices are shown in block diagram form in order to avoid unnecessarily obscuring the present invention.

The present invention has applicability to a wide range of fields in which multi-channel data exist, including, for example, virtual reality, doppler radar, voice analysis, geophysics, mechanical vibration analysis, materials science, robotics, locomotion, biometrics, surveillance, detection, discrimination, tracking, video, optical design, and heart modeling.

FIG. 1 is a diagram of a system for providing linear prediction, according to an embodiment of the present invention. As shown in FIG. 1, a multi-channel data source 101 provides data that is converted to quaternions by a data representation module 103. Quaternions have not been employed in signal processing, as conventional linear prediction techniques cannot process quaternions in that these techniques employ the concept of numbers, not points. According to one embodiment of the present invention, quaternions can be parsed into a rotational part and a scaling part; this construct, for example, can correct time warping, as will be more fully described below.

These quaternions are then supplied to a non-commutative linear predictor 105, which generates the linear prediction matrix 107 of weights and associated residuals. The linear predictor 105, in an exemplary embodiment, provides a generalization of the Levinson algorithm to process non-invertible autocorrelation matrices over any ring that admits compact projections. Linear predictive techniques conventionally have been presented in a statistical context, which excludes the majority of multi-channel data sources to which the linear predictor 105 is targeted.

The signal processing of spatial time series has been traditionally limited by the lack of a sophisticated link between the signal processing algebra and the spatial geometry. The ordinary algebra of the real or complex numbers satisfies the commutative law a×b=b×a and the law of inverses: for every non-zero number a there is a number

$\frac{1}{a}$ for which

${a \times \left( \frac{1}{a} \right)} = {{\left( \frac{1}{a} \right) \times a} = 1.}$ However, these properties fail for the quaternions and for three-dimensional multi-channel signal processing. The theories of hermitian regular rings and compact projections allow important signal processing techniques to be utilized in such situations.

One of the major application areas of the invention is to video image processing. To enable this application, color data needs to be correctly represented as four-dimensional spatial points. Photopic coordinates are four-dimensional analogs of the common RGB (Red-Green-Blue) colormetric coordinates.

Also, in gait analysis, for example, each joint reports where it currently is located. In the oil exploration example, each of many sensors spread over the area that is being searched sends back information about where the surface on which it is sitting is located after the geologist has set off a nearby explosion. The cardiology example requires knowing, for many structures inside and around the heart, how these structures move as the heart beats.

Even the video example can be seen that way because each pixel on the screen is reporting its color at every moment of time. However, a “color” is not a simple number: it is actually (at least) 3 numbers such as the amount of red, blue, and green (RGB) light needed to make that color. Those three numbers are usually thought of as being in a “color space” which is a kind of abstract space like three-dimensional space.

As mentioned, the present invention, according to one embodiment, represents each such point in space by a mathematical object called a “quaternion.” Quaternions can describe special information, such as rotations, perspective drawing, and other simple concepts of geometry. If a signal, such as the position of a joint during a walk is described using quaternions, it reveals structure in the signal that is hidden such as how the rotation of the knee is related to the rotation of the ankle as the walk proceeds.

FIGS. 2A and 2B are diagrams of multi-channel data capable of being processed by the system of FIG. 1. As shown in FIG. 2A, many practical datasets comprise time series . . . x_(n−2), x_(n−1), x_(n) of data vectors where, at each time n, the datum x_(n) is a vector

$x_{n} = \begin{pmatrix} {x_{n}(1)} \\ {x_{n}(2)} \\ \vdots \\ {x_{n}(K)} \end{pmatrix}$ of three-dimensional measurements. Each component x_(n)(k) represents the measurement of a single channel and is itself composed of three separate real numbers x_(n)(k)=(x_(n)(k)¹x_(n)(k)²x_(n)(k)³) corresponding to the three dimensions of whatever system that is being measured.

It is clear that cross-channel measurements can be represented as a list, x_(n):

${x_{n} = \begin{pmatrix} \begin{pmatrix} {x_{n}(1)}^{1} \\ {x_{n}(2)}^{1} \\ \vdots \\ {x_{n}(K)}^{1} \end{pmatrix} & \begin{pmatrix} {x_{n}(1)}^{2} \\ {x_{n}(2)}^{2} \\ \vdots \\ {x_{n}(K)}^{2} \end{pmatrix} & \begin{pmatrix} {x_{n}(1)}^{3} \\ {x_{n}(2)}^{3} \\ \vdots \\ {x_{n}(K)}^{3} \end{pmatrix} \end{pmatrix}},$ such as the RGB bitplanes of video and, in fact, this is usually how three-dimensional datasets are generated. However, the former representation is conceptually more basic.

As seen in FIG. 2B, a time series relating to the prices of stocks, for example, exist, and can be viewed as a single multi-channel data. In this example, three sources 201, 203, 205 can be constructed as a single vector based on time, t.

According to one embodiment of the present invention, multi-channel can be represented as quaternions. Specifically, the present invention provides an approach for analyzing and coding such time series by representing each measurement x_(n)(j) using the mathematical construction called a quaternion.

FIG. 3 is a flow chart of a process for representing multi-channel data as quaternions, according to an embodiment of the present invention. In step 301, multi-channel data is collected and then represented as quaternions, as in step 303. These quaternions, per step 305, are then output to a linear predictor (e.g., predictor 105 of FIG. 1).

As used herein, the quaternion algebra is denoted

. Quaternions are four-dimensional generalizations of the complex numbers and may be viewed as a pair of complex numbers (as well as many other representations). Quaternions also have the standard three-dimensional dot- and cross-products built into their algebraic structure along with four-dimensional vector addition, scalar multiplication, and complex arithmetic.

The quaternions have the arithmetical operations of +,−,×, and ÷ for non-0 denominators defined on them and so provide a scalar structure over which vectors, matrices, and the like may be constructed. However, the peculiarity of quaternions is that multiplication is not commutative: in general, q×r≠r×q for quaternions q,r and thus

forms a division ring, not a field.

The present invention, according to one embodiment, presented herein stems from the observation that many traditional signal processing algorithms, especially those pertaining to linear prediction and linear predictive coding, do not depend on the commutative law holding among the scalars once these algorithms are carefully analyzed to keep track of which side (left or right) scalar multiplication takes place.

As a result, a three- (or four-) dimensional data point can be thought of as a single arithmetical entity rather than a list of numbers. There are great advantages to be gained, both conceptually and practically, by doing so.

As mentioned previously, the application of present invention spans a number of disciplines, from biometrics to virtual reality. For instance, all human control devices from the mouse or gaming joystick up to the most complex virtual reality “suit” are mechanisms for translating spatial motion into numerical time series. One example is a “virtual reality” glove that contains 22 angle-sensitive sensors arrayed on a glove. Position records are sent from the glove to a server at 150 records/sensor/sec at the RS-232 rate of 115.2 kbaud. After conversion to rectangular coordinates, this is precisely a 22-channel time series . . . x_(n−2), x_(n−1), x_(n),

$x_{n} = \begin{pmatrix} {x_{n}(1)} \\ {x_{n}(2)} \\ \vdots \\ {x_{n}(22)} \end{pmatrix}$ of three-dimensional data as discussed above.

The high data rate and sensor sensitivity of the virtual glove is sufficient to characterize hand positions and velocities for ordinary motion. However, the human hand is capable of “extraordinary” motion; e.g., a skilled musician or artisan at work. For example, both pianists and painters have the concept of “touch”, an indefinable relation of the hand/finger system to the working material and which, to the trained ear or eye, characterizes the artist as well as a photograph or fingerprint. It is just such subtle motions, which unerringly distinguish human actions from robotic actions.

Even to begin the modeling and reproduction of the true human hand, much higher data rates, much more precise sensors, and much denser sensor array are required. The numbers are comparable, in fact, to the data rates, volume, and density of the nervous system connecting the hand to the brain. At such levels, efficient storing and transmission of such multi-channel data become critical. It is not sufficient to save bandwidth by transmitting only every tenth or hundredth hand position of a pilot landing a jet fighter on the flight deck of a carrier. Instead, the time series need to be globally compressed so that actual redundancy (introduced by inertia and physiological/geometric constraints) but not critical information is removed.

Multi-channel analysis is also utilized in geophysics. Geophysical explorers, like special effects people in cinema, are in the enviable position of being able to set off large explosions in the course of their daily work. This is a basic mode of gathering geophysical data, which arrives from these earth-shaking events (naturally occurring or otherwise) in the form of multi-channel time series recording the response of the earth's surface to the explosions. Each channel represents the measurements of one sensor out of a strategically-designed array of sensors spread over a target area.

While the input data series of any one channel is typically one-dimensional, representing the normal surface strain at a point, the target series is three-dimensional; namely, the displacement vector of each point in a volume. Geophysics is, more than most sciences, concerned with inverse problems: given the boundary response of a mechanical system to a stimulus, determine the response of the three-dimensional internal structure. As oil and other naturally occurring resources become harder to find, it is imperative to improve the three-dimensional signal processing techniques available.

Similar to geophysicists, mechanical engineers examine system response measurements. Typically, a body is covered in a multi-channel network of strain or motion sensors and shakers is attached at selected points. The data usually is transferred to a finite-element model of the system, which is a triangularization of the three-dimensional physical system. Abstractly, these finite-element datasets are nothing more than the multi-channel three-dimensional time series.

Multi-channel analysis also has applicability to biophysics. If a grid is placed over selected points of photographed animals' bodies, and concentrated especially around the joints, time series of multi-channel three-dimensional measurements can be generated from these historical datasets by standard photogrammetric techniques.

The human knee is a complex mechanical system with many degrees of freedom most of which are exercised during even a simple stroll. This applies to an even greater degree to the human spine, with its elegant S-shape, perfectly designed to carry not only the unnatural upright stance of homo sapiens but to act as a complex linear/torsional spring with infinitely many modes of behavior as the body walks, jumps, runs, sleeps, climbs, and, not least of all, reproduces itself. Many well-known neurological diseases, such as multiple sclerosis, can be diagnosed by the trained diagnostician simply by visual observation of the patient's gait.

Paleoanthropologists use computer reconstructions of hominid gaits as a basic tool of their trade, both as an end product of research and a means of dating skeletons by the modernity of the walk they support. Animators are preeminent gait modelers, especially these days when true-to-life non-existent creatures have become the norm.

The present invention also applicability to biometric identification. Closely related to the previous example is the analysis of real human individuals' walking characteristics. It is observed that people frequently can be identified quite easily at considerable distances simply by their gait, which seems as characteristic of a person as his fingerprints. This creates some remarkable possibilities for the identification and surveillance of individuals by extracting gait parameters as a signature.

It might be possible, for example, to establish the identity of a criminal suspect through analysis of gait characteristics from closed circuit television (CCTV) recording, even when the quality of these videos is too poor to isolate facial structure. A system could be constructed that would follow a particular individual through, say, a crowded airport or cityscape by identifying his walking signature via CCTV. An ordinary disguise, of course, will not fool such a system. Even the conscious attempt to walk differently may not succeed because the primary determinants of gait (such as the particular mechanical properties of the spine/pelvis interface) may be beyond conscious control.

The present invention, additionally, is applicable to detection, discrimination, and tracking of targets. There are many targets which move in three spatial dimensions and which it may be desirable to detect and track. For example, a particular aircraft or an enemy submarine in the ocean. Although there are far fewer channels than in gait analysis, these target tracking problems have a much higher noise floor.

There are many well-known techniques of adapting linear prediction to noisy signals, one of the simplest yet most effective being to manually adjust the diagonal coefficients of the autocorrelation matrix.

Multi-channel analysis can also be applied to video processing. Spatial measurements are not the only three-dimensional data which has to be compressed, processed, and transmitted. Color is (in the usual formulations) inherently three-dimensional in that a color is determined by three values: RGB, YUV (Luminance-Bandwidth-Chrominance), or any of the other color-space systems in use.

A video stream can be modeled by the same time series . . . x_(n−2), x_(n−1), x_(n) approach that has been traditionally employed, except that now a channel corresponds to a single pixel on the viewing screen:

$x_{n} = \begin{pmatrix} {C_{n}(11)} & \cdots & {C_{n}\left( {1N} \right)} \\ \vdots & ⋰ & \vdots \\ {C_{n}({M1})} & \cdots & {C_{n}({MN})} \end{pmatrix}$

where C_(n)(jk)=(C_(n)(jk)^(R) C_(n)(jk)^(G) C_(n)(jk)^(B)) are the three color coordinates at time n in, for example, the RGB system of pixel j,k out of a total resolution of (M×N) pixels.

As mentioned previously, many hardware systems require the data to be arranged in the dual form of three value planes rather than planes of three values. With the large quantity of data represented by . . . x_(n−2), x_(n−1), x_(n), compression is the key to successful video manipulation. For example, there is increasing pressure for corporate intranets to carry internal video signals and, for these applications, security is a critical necessity from the outset.

According to one embodiment, the present invention introduces the concept of photopic coordinates; it is shown that, just as in spatial data, color data is modeled effectively by quaternions. This construct permits application of the non-commutative methods to color images and video a reanalysis of the usual color space has to be performed, recognizing that color space has an inherent four-dimensional quality, in spite of the three-dimensional RGB and similar systems.

Many signal processing problems are presented in the form of overlapping frames laid over a basic single-channel time series:

$\begin{matrix} x_{1} & x_{2} & \cdots & x_{K} \end{matrix}\mspace{25mu} x_{{K + 1}}\mspace{34mu}\cdots\mspace{14mu} x_{n}\mspace{14mu}\cdots$ $\mspace{14mu}{x_{1}\mspace{25mu}\cdots\mspace{11mu}\begin{matrix} x_{d + 1} & x_{d + 2} & \cdots & x_{d + K} \end{matrix}\mspace{14mu}\cdots\mspace{14mu} x_{n}\mspace{14mu}\cdots}$ $\mspace{14mu}{x_{1\mspace{14mu}}\mspace{14mu} x_{2}\mspace{169mu}{\vdots\mspace{340mu}.\mspace{14mu} x_{1\mspace{14mu}}}\mspace{14mu} x_{2}\mspace{25mu}\cdots\mspace{85mu}\begin{matrix} x_{{md} + 1} & x_{{md} + 2} & \cdots & x_{{md} + K} \end{matrix}\mspace{14mu}\cdots}$                        ⋮

High-resolution spectral analysis by linear prediction or some other method is performed separately within each frame

$\;\begin{matrix} x_{{md} + 1} & x_{{md} + 2} & \cdots & x_{{md} + K} \end{matrix}$ and then the resulting power spectra P₀(ω), P₁(ω), . . . , P_(m)(ω), . . . are analyzed as a new data sequence.

This is the traditional approach in voice analysis where the resulting spectra are presented in the well-known spectrogram form. However, it is used in many other applications such as the Doppler radar analysis of rotating bodies in which the distances of reflectors from the axis of rotation can be deduced from the instantaneous spectra of the returned signal.

More generally, this frame-based spectral analysis can be regarded as the demodulation of an FM (Frequency Modulation) signal because the information that is to be extracted is contained in the instantaneous spectra of the signal. Unfortunately, this within-frame approach ignores some of the most important information available; namely the between-frame correlations.

For example, in the rotating Doppler radar problem, a single rotating reflector gives rise to a sinusoidally oscillating frequency spike in the spectra sequence P₀(ω), P₁(ω), . . . , P_(m)(ω), . . . . The period of oscillation of this spike is the period of rotation of the reflector in space while the amplitude of the spike's oscillation is directly proportional to the distance of the reflector from the axis of rotation. These oscillation parameters cannot be read directly from any individual spectrum P_(m)(ω) because they are properties of the mutual correlations between the entire sequence P₀(ω), P₁(ω), . . . , P_(m)(ω), . . . .

This point is brought out especially well in the presence of noise which, as is well-known, has a strongly deleterious effect on any high-resolution spectral analysis method. An individual spectrum P_(m)(ω) may not exhibit any discernable spike but since it is known that there is an underlying oscillation in the series P₀(ω), P₁(ω), . . . , P_(m)(ω), . . . , a way exists to combine these spectra to filter out the cross-frame noise.

It is recognized that by imposing the frame structure on the time sequence, the signal is transformed into a multi-channel sequence:

$\begin{matrix} x_{1} \\ x_{2} \\ \vdots \\ x_{K} \end{matrix},\begin{matrix} x_{d + 1} \\ x_{d + 2} \\ \vdots \\ x_{d + K} \end{matrix},\cdots\mspace{11mu},\begin{matrix} x_{{md} + 1} \\ x_{{md} + 2} \\ \vdots \\ x_{{md} + K} \end{matrix},\cdots$ with the number of channels K equal to the frame width.

As is more fully described below, linear predictive analysis of such a multi-channel sequence gives rise to coefficients a₁, . . . , a_(m), . . . which are (K×K) matrices rather than single scalars. Thus, the spectra P_(m)(ω) produced by these coefficients are themselves (K×K) matrices.

However, the correlations that are sought after, such as the oscillation patterns produced by rotating radar reflectors, cause these power spectra matrix sequences P₀(ω), P₁(ω), . . . , P_(m)(ω), . . . to become singular; i.e., the autocorrelation matrices of P₀(ω), P₁(ω), . . . , P_(m)(ω), . . . (which are matrices whose entries are themselves matrices) becomes non-invertible. In fact, the non-inevitability of this matrix is equivalent to cross-spectral correlation.

Unfortunately, the prior approaches to linear prediction break down at this exact point because these conventional approaches cannot handle the problem of channel degeneracy.

The present invention, according to one embodiment, advantageously operates in the presence of highly degenerate data.

As noted, the present invention can be utilized in the area of optics. It has been understood that optical processing is a form of linear filtering in which the two-dimensional spatial Fourier transforms of the input images are altered by wavenumber-dependent amplitudes of the lens and other transmission media. At the same time, light itself has a temporal frequency parameter ν which determines the propagation speed and the direction of the wave fronts by means of the frequency-dependent refractive index. Thus, the abstract optical design and analysis problem is determining the relation between the four-component wavevector ({right arrow over (σ)},ν) and the on the four-component space-time vector ({right arrow over (x)},t) on each point of a wavefront as it moves through the optical system.

Both ({right arrow over (σ)},ν) and ({right arrow over (x)},t) for a single point on a wavefront can be viewed as series of four-dimensional data, and thus, a mesh of points on a wavefront generates two sets of two-dimensional arrays of four-dimensional data. As is seen, ({right arrow over (σ)},ν),({right arrow over (x)},t) are naturally structured as quaternions. There are many possibilities for joint linear predictive analysis of these series. In particular, estimating the four-dimensional power spectra by solving for the all-pole filter produced by the linear prediction model.

Passing from two-dimensional arrays of three-dimensional data, there are many applications which require three-dimensional arrays of three-dimension data. For example, the stress of a body is characterized by giving, for every point (x,y,z) inside the unstressed material, the point (x+εx,y+εy,z+εy) to which (x,y,z) has been moved. If a uniform grid of points (lΔx,mΔy,nΔz), {l,m,n}⊂

³ defines the body, then the three-dimensional array

$\left. \begin{pmatrix} \; & \vdots & \; \\ \cdots & \left( {{\delta\; x},{\delta\; y},{\delta\; z}} \right)_{l,m,n} & \cdots \\ \; & \vdots & \; \end{pmatrix}\rightarrow \right)$

of three-dimensional data approximates the stress. For example, from this matrix, an approximation of the stress tensor may be derived.

A good example of the use of these ideas is three-dimensional, dynamic modeling of the heart. The stress matrix can be obtained from real-time tomography and then linear predictive modeling can be applied. This has many interesting diagnostic applications, comparable to a kind of spatial EKG (Electrocardiogram).

As is discussed later, the system response of the quaternion linear filter is a function of two complex values (rather than one as in the commutative situation). Thus the “poles” of the system response really is a collection of polar surfaces in

×

≅

⁴. Because of the strong quasi-periodicities in heart motion and because the linear prediction filter is all-pole, these polar surfaces can be near to the unit 3-sphere (the four-dimensional version of the unit circle) in

⁴.

The stability of the filter is determined by the geometry of these surfaces, especially by how close they approach the 3-sphere. It is likely that this can be translated into information about the stability of the heart motion, which is of great interest to cardiologists.

FIG. 4 is a flowchart of the operation for performing non-commutative linear prediction in the system of FIG. 1. Linear prediction (LP) has been a mainstay of signal processing, and provides, among other advantages, compression and encryption of data. Linear prediction and linear predictive coding, according to one embodiment of the present invention, requires computation of an autocorrelation matrix of the multi-channel data, as in step 301. While theoretically creating the possibility of significant compression of multi-channel sets, such high degrees of correlation also create algorithmic problems because it causes the key matrices inside the algorithms to become singular or, at least, highly unstable. This phenomenon can be termed “degeneracy” because it is the same effect which occurs in many physical situations in which energy levels coalesce due to loss of dimensionality.

Degeneracy cannot be removed simply by looking for “bad” channels and eliminating them. For one thing, such a scheme is too costly in time, and fundamentally flawed, because degeneracy is a global or system-wide phenomenon. The problem of degeneracy of multi-channel data has generally been ignored by algorithm designers. For example, traditional approaches only consider the case in which the autocorrelation matrices are either non-singular (another way of saying the system is not degenerate) or that the singularity can be confined to a few deterministic channels. Without this assumption, the popular linear prediction method, referred to as the Levinson algorithm, fails in its usual formulation.

Real multi-channel data, as discussed above, can be expected to be highly degenerate. The present invention, according to one embodiment, can be used to formulate a version of the Levinson algorithm that does not assume non-degenerate data. This is accomplished by examining the manner in which matrix inverses enter into the algorithm; such inverses can be replaced by pseudo-inverses. This is an important advance in multi-channel linear prediction even in the standard commutative scalar formulations.

In step 303, pseudo-inverses of the autocorrelation matrix are generated, thereby overcoming any limitations stemming for the non-inevitability problem. The linear predictor then outputs the linear prediction matrix containing the LP coefficients and residuals, per step 305.

The general idea of compression is that any data set contains hidden redundancy which can be removed, thus reducing the bandwidth required for the data's storage and transmission. In particular, predictive coding removes the redundancy of a time series . . . x_(n−2), x_(n−1), x_(n) by determining a predictor function

( ) and a new residual data series . . . e_(n−2), e_(n−1), e_(n) for which x _(n)=

(x _(n−1) ,x _(n−2), . . . )+e _(n) for every n in an appropriate range. Ideally,

( ) will depend on relatively few parameters, analogous to the coefficients of a system of differential equations and which are transmitted at the full bit-width, while . . . e_(n−2), e_(n−1), e_(n) will have relatively low dynamic range and thus can be transmitted with fewer bits/symbol/time than the original series. The series, . . . e_(n−2), e_(n−1), e_(n), can be thought of as equivalent to the series . . . x_(n−2), x_(n−1), x_(n) but with the deterministic redundancy removed by the predictor function

( ). Equivalently, . . . e_(n−2), e_(n−1), e_(n) is “whiter” than . . . x_(n−2), x_(n−1), x_(n); i.e., has higher entropy per symbol.

The compression can be increased by allowing lossy reconstruction in which only a fraction (possibly none) of the residual series . . . e_(n−2), e_(n−1), e_(n) is transmitted/stored. The missing residuals are reconstructed as 0 or some other appropriate value.

Encryption is closely associated with compression. Encryption can be combined with compression by encrypting the

( ) parameters, the residuals . . . e_(n−2), e_(n−1), e_(n), or both. This can be viewed as adding encoded redundancy back into the compressed signal, analogous to the way error-checking adds unencoded redundancy.

Linear prediction and linear predictive coding use a finite linear function

(x _(n−1) ,x _(n−2) ,x _(n−3), . . . )=−a ₁ x _(n−1) −a ₂ x _(n−2) −a ₃ x _(n−3) . . . −a _(M) x _(n−M) with constant coefficients as the predictor.

So defining a₀=1, the full LP model of order M is

${\sum\limits_{m = 0}^{M}{a_{m}x_{n - m}}} = e_{n}$

It is noted that when each x_(n) is a K-channel datum, the coefficients a_(m) must be (K×K) matrices over the scalars (typically

,

, or

).

A number of non-LP coding schemes exists, such as the Fourier-based JPEG (Joint Photographic Experts Group) standard. The LP models have a universality and tractability which make them benchmarks.

Linear prediction becomes statistical when a probabilistic model is assumed for the residual series, the most common being independence between times and multi-normal within a time; that is, between channels at a single moment of time when each x_(n) is a multi-channel data sample.

The property enjoyed by the multi-normal density

${{\phi\left( {x_{1},\cdots\mspace{11mu},x_{n}} \right)} = {{\phi\left( \overset{\rightarrow}{x} \right)} = {\frac{1}{\left( {2\;\pi} \right)^{n/2}}\frac{1}{\sqrt{\det\mspace{11mu}\Sigma}}{\mathbb{e}}^{{- \frac{1}{2}}{({\overset{\rightarrow}{x} - \overset{\rightarrow}{\mu}})}^{T}{\Sigma^{- 1}{({\overset{\rightarrow}{x} - \overset{\rightarrow}{\mu}})}}}}}},$ where Σ is the covariance matrix and {right arrow over (μ)} the mean of {right arrow over (x)}, and no other distribution is that uncorrelated multi-normal random variables are statistically independent. As a result, “independent” in the sense of linear algebra is identical to “independent” in the sense of probability theory. By linearly transforming the variables to the principal axes determined by the eigenstructure of Σ, consideration can be narrowed to independent, normally distributed random variables. The residuals can be tested for significance using standard χ²- or F-tests, analysis of variance (ANOVA) tables can be constructed, and the rest.

In essence, then, any advancement of linear predictive coding must either improve the linear algebra or improve the statistics or both.

The present invention advances the linear algebra by introducing non-commutative methods, with the quaternion ring

as a special case, into the science of data coding. The present invention also advances the statistics by reanalyzing the basic assumptions relating linear models to stationary, ergodic processes. In particular, it is demonstrated by analyzing source texts that linear prediction is not a fundamentally statistical technique and is, rather, a method for extracting structured information from structured messages.

Like all signal processing methodologies, the three-dimensional, non-commutative technique is a series of modeling “choices,” not just one algorithm applicable to all situations. As a result of this and due to the unfamiliarity of many of the mathematical concepts being used, an attempt is made to provide a reasonably self-contained presentation of the context in which the modeling takes place.

In statistical signal processing, LP appears as autoregressive models (AR). These are a special case of autoregressive-moving average models (ARMA) which, unlike AR models, have both poles and zeros; i.e. modes and anti-modes. For example, in radar applications, the same general class of techniques are usually called autoregressive spectral analysis and have found diverse applications including target identification through LP analysis of Doppler shifts.

As pointed out previously, the K-channel linear predictive model is as follows:

${{\underset{m = 0}{\sum\limits^{M}}{a_{m}x_{n - m}}} = e_{n}},$ which requires the coefficients a_(m) to be (K×K) matrices which, in general, do not commute: a·b=b·a. As is discussed below, when the entries of the matrices a_(m) themselves are commutative, the non-commutativity of the a_(m) can be controlled at the determinants since det(a·b)=det(b·a) even when a·b=b·a.

However, once the matrices are composed of non-commutative entries, the determinant is no longer useful. This results, for example, if higher-order prediction is to be performed in which multiple channels of series (which are themselves multi-channel series are utilized). This is not an abstraction: many real series are presented in this form. For example, it may be the case that the multi-channel readings of geophysical experiments from many separate locations are used and it is desired to assemble them all into a single predictive model for, say, plate tectonic research. It is not the case that the model derived by representing all channels into a large, flat matrix is the same as that obtained by regarding the coefficients a_(m) as matrices whose entries are also matrices.

The general linear prediction problem is thus concerned with the algebraic properties of the set

(n,m,A) of (n×m) matrices whose entries are in some scalar structure A. Appropriate scalar structures is discussed in below with respect to quaternion representations. In many cases, however, A is itself a matrix structure

(k,l,B). There is thus a tendency to regard aε

(n,m,A), with A=

(k,l,B), as “really” structured as aε

(nk,ml,B):

$\begin{matrix} {{\underset{\downarrow}{\overset{\uparrow}{n}}\mspace{14mu}\overset{\leftarrow\mspace{45mu}{m\mspace{40mu}\rightarrow}}{\begin{pmatrix} a_{11} & \cdots & a_{1m} \\ \vdots & ⋰ & \vdots \\ a_{n\; 1} & \cdots & a_{n\; m} \end{pmatrix}}},{a_{v\;\mu} = {\underset{\downarrow}{\overset{\uparrow}{k}}\mspace{14mu}\overset{\leftarrow\mspace{31mu}{l\mspace{31mu}\rightarrow}}{\begin{pmatrix} a_{{v\;\mu},11} & \; & a_{{v\;\mu},{1l}} \\ \; & \; & \; \\ \; & \; & \; \\ a_{{v\;\mu},{k\; 1}} & \; & a_{{v\;\mu},{kl}} \end{pmatrix}}}}} \\  \Updownarrow \\ {\underset{\downarrow}{\overset{\uparrow}{nk}}\mspace{14mu}{\overset{\leftarrow\mspace{85mu}{{m\; l}\mspace{85mu}\rightarrow}}{\begin{pmatrix} a_{11,11} & \cdots & a_{12,11} & \cdots & \; & \; & \; & \; & \cdots & a_{{1m},{1l}} \\ \vdots & ⋰ & \vdots & ⋰ & \; & \; & \; & \; & \ddots & \vdots \\ \; & \; & \; & \; & \; & \; & \; & \; & \; & \; \\ \; & \; & \; & \; & \; & \; & \; & \; & \; & \; \\ \vdots & \ddots & \vdots & \ddots & \; & \; & \; & \; & ⋰ & \vdots \\ a_{{n\; 1},{k\; 1}} & \cdots & a_{{n\; 2},{k\; 1}} & \cdots & \; & \; & \; & \; & \cdots & a_{{n\; m},{kl}} \end{pmatrix}}.}} \end{matrix}$

However, this is a distorted way of viewing the problem because the internal coefficients a_(νμ,στ) are functioning on a deeper level than the external coefficients a_(νμ). In more concrete terms, as mentioned above the solution to the linear prediction problem corresponding to aε

(n,m,A) has nothing whatsoever to do with the linear prediction problem corresponding to aε

(nk,ml,B).

The correct metaphor is to regard the expression

(n,m,−) as defining a matrix class in the sense of object-oriented programming, then for any object A,

(n,m,A) is an object inheriting the properties of

(n,m,−), and utilizing the arithmetic of A to define operations such as matrix multiplication and addition. A itself inherits from a general scalar class defining the arithmetic of A. However, these classes are so general that

(n,m,A) itself can be regarded as a scalar object, using its defined arithmetic. Accordingly, in the other direction, the scalar object A might itself be some matrix object

(k,l,B).

In spite of the degree of abstraction this metaphor requires, it is the only one which correctly captures the general multi-channel situation. It is easy to imagine real-world multi-channel situations, such as the geophysics situation described previously, in which deep inheritance hierarchies are generated.

The present invention, according to one embodiment, addresses special cases of this general data-structuring problem, in which the introduction of non-commutative algebra into signal processing is a major advance towards a solution of the general case. The reason that multi-channel linear prediction produces significant data compression is the large cross-channel and cross-time correlation. This implies a high degree of redundancy in the datasets which can be removed, thereby reducing the bandwidth requirements.

Correlations are introduced in mechanical finite-element systems by physical constraints of shape, boundary conditions, material properties, and the like as well as the inertia of components with mass. This is also true for animal/robotic motion whose strongest constraints are due the semi-rigid structure of bone or metal.

In fact, as noted previously, multi-channel data is actually steeped with correlations—which was not an issue for single-channel processing. For example, when a single-channel linear predictor has been able to reduce the prediction error of a signal to 0, this can be interpreted as a sign of highly successful compression: it is demonstrated that the channel is carrying a deterministic sum of damped exponentials whose values can be determined by locating the roots of the characteristic polynomial of the system. In reality, things are not this simple; in practice, one regards a “perfect” linear prediction as indicative of too many coefficients and reduces the model order accordingly. However, things are far more complicated for multi-channel analysis because a large number of “perfect” channels are used.

That part of ordinary calculus, of any number of real or complex variables, which goes beyond simple algebra, is based in the fact that

is a metric space for which the compact sets are precisely the closed, bounded sets. The higher-dimensional spaces

^(n),

^(n) inherit the same property. The algebra of

,

plus the simple geometric combinatorics of covering regions by boxes allow all of calculus, complex, analysis, Fourier series and integrals, and the rest to be built up in the standard manner from this compactness property of

.

Topologically and metrically, the quaternion ring is simply

⁴; with careful use of quaternion algebra (especially the non-commutativity), the same development can be followed for

. All the standard results such as the Cauchy Integral Theorem, the Implicit Function Theorem, and the like have their quaternion analogs (often in left- and right-forms because of non-commutativity).

As a consequence, there is no problem in developing

-versions of z-transforms and Laurent series, hence the P(z) and D(z) of the previous section. In fact, the theory of quaternion system functions is much richer than for the complex field because as is shown later, a quaternion variable z consists of two independent complex variables

$\begin{pmatrix} z_{+} \\ z_{-} \end{pmatrix}.$ Many unexpected frequency-domain phenomena will appear, unknown from the one variable situation, because of the geometric and analytic interactions of z₊ and z⁻.

Because

is non-commutative, the det( ) operator does not behave “properly”. The most important property of det( ) which fails over

is its invariance under multiplication of columns or rows by a scalar; i.e., it is generally the case that

${{\det\begin{pmatrix} \begin{matrix} a_{11} \\ \vdots \\ a_{M\; 1} \end{matrix} & \cdots & {k\begin{pmatrix} a_{1j} \\ a_{ij} \\ a_{Mj} \end{pmatrix}} & \cdots & \begin{matrix} a_{1N} \\ a_{iN} \\ a_{MN} \end{matrix} \end{pmatrix}} \neq {k\mspace{14mu}{\det\begin{pmatrix} \begin{matrix} a_{11} \\ \vdots \\ a_{M\; 1} \end{matrix} & \cdots & \begin{pmatrix} a_{1j} \\ a_{ij} \\ a_{Mj} \end{pmatrix} & \cdots & \begin{matrix} a_{1N} \\ a_{iN} \\ a_{MN} \end{matrix} \end{pmatrix}}}},$ for kε

.

As a result, basic identities such as det(ab)=det(a)det(b) and Cramer's Rule also fail.

Importantly, it is not the case that a matrix a over

is invertible if and only if det(a) is invertible in

. This is because the matrix adjoint a^(adj) generally satisfies a·a^(adj)≠det(a)·1 over non-commutative rings.

The present invention advantageously permits application of the Levinson algorithm in a wide class of cases in which the autocorrelation coefficients are not in a commutative field. In particular, it is shown that the modified Levinson algorithm applies to quaternion-valued autocorrelations, hence, for example, to 3 and (3+1)-dimensional data.

The algebra of complex numbers can be viewed as ordered pairs of real numbers (a,b), referred to as couplets. Addition was defined by the rule (a,b)+(c,d)=(a+c,b+d) and, most importantly, multiplication defined by the rule: (a,b)·(c,d)=(ac−bd,ad−bc).

It has been shown that with these definitions, couplets could be added, subtracted, multiplied, and, when the divisor did not equal (0,0), divided as well.

Thus, i=√{square root over (−1)} can be simply defined as the couplet (0,1), while the couplet 1 (which is different in an abstract sense from the number 1) was defined to be (1,0).

Any couplet (a,b) could then be written uniquely in the form (a,b)=a(1,0)+b(0,1)=a1+bi=a+bi and the link to the complex numbers was complete.

An equivalent representation of the complex number a+bi is the (2×2) real matrix:

${〚\;{a + {bi}}\; 〛} = {\begin{pmatrix} a & b \\ {- b} & a \end{pmatrix}.}$

This representation is important for understanding the more complicated quaternion representations.

Using the ordinary laws of matrix arithmetic, the following exists:

${{〚\;{a + {bi}}\; 〛} + {〚\;{c + {di}}\; 〛}} = {{\begin{pmatrix} a & b \\ {- b} & a \end{pmatrix} + \begin{pmatrix} c & d \\ {- d} & c \end{pmatrix}} = {\begin{pmatrix} {a + c} & {b + d} \\ {- \left( {b + d} \right)} & {a + c} \end{pmatrix} = {〚{\left( {a + {bi}} \right) + \left( {c + {di}} \right)}〛}}}$ and ${{s \cdot {〚\;{a + {bi}}\; 〛}} = {{s \cdot \begin{pmatrix} a & b \\ {- b} & a \end{pmatrix}} = {\begin{pmatrix} {s \cdot a} & {s \cdot b} \\ {{- s} \cdot b} & {s \cdot a} \end{pmatrix} = {〚\;{s \cdot \left( {a + {bi}} \right)}〛}}}},{{{for}\mspace{14mu}{any}\mspace{14mu} s} \in {R.}}$

Most significantly,

$\begin{matrix} {{{〚\;{a + {bi}}\; 〛} \cdot {〚\;{c + {di}}〛}} = {\begin{pmatrix} a & b \\ {- b} & a \end{pmatrix}\begin{pmatrix} c & d \\ {- d} & c \end{pmatrix}}} \\ {= \begin{pmatrix} {{a\; c} - {bd}} & {{ad} + {bc}} \\ {- \left( {{ad} + {bc}} \right)} & {{a\; c} - {bd}} \end{pmatrix}} \\ {= {{〚{\left( {a + {bi}} \right) \cdot \left( {c + {di}} \right)}〛}.}} \end{matrix}$

In this representation,

${1 = {{〚\; 1\; 〛} = \begin{pmatrix} 1 & 0 \\ 0 & 1 \end{pmatrix}}},{I = {{〚\; i\; 〛} = \begin{pmatrix} 0 & 1 \\ {- 1} & 0 \end{pmatrix}}}$ and thus

$\begin{matrix} {{〚\;{a + {bi}}\; 〛} = {\begin{pmatrix} a & b \\ {- b} & a \end{pmatrix} = {{{a \cdot \begin{pmatrix} 1 & 0 \\ 0 & 1 \end{pmatrix}} + {b \cdot \begin{pmatrix} 0 & b \\ {- b} & 0 \end{pmatrix}}} = {{a \cdot 1} + {b \cdot I}}}}} \\ {I^{2} = {{\begin{pmatrix} 0 & 1 \\ {- 1} & 0 \end{pmatrix}\begin{pmatrix} 0 & 1 \\ {- 1} & 0 \end{pmatrix}} = {\begin{pmatrix} {- 1} & 0 \\ 0 & {- 1} \end{pmatrix} = {- 1}}}} \end{matrix}$ and so, once again, the law i²=−1 receives a clear interpretation.

Also the complex conjugate is represented by the transpose:

${〚\left( {a + {bi}} \right)^{*}〛} = {{〚{a - {bi}}〛} = {\begin{pmatrix} a & {- b} \\ b & a \end{pmatrix} = {\begin{pmatrix} a & b \\ {- b} & a \end{pmatrix}^{T} = {〚{a + {bi}}〛}^{T}}}}$ and the squared norm |z|² represented by the determinant

${{a + {bi}}}^{2} = {{a^{2} + b^{2}} = {{\det\begin{pmatrix} a & b \\ {- b} & a \end{pmatrix}} = {{\det\; 〚\;{a + {bi}}\; 〛}.}}}$

The following is noted:

${\begin{pmatrix} a & b \\ {- b} & a \end{pmatrix} \cdot \begin{pmatrix} a & b \\ {- b} & a \end{pmatrix}^{T}} = {{\begin{pmatrix} a & b \\ {- b} & a \end{pmatrix} \cdot \begin{pmatrix} a & {- b} \\ b & a \end{pmatrix}} = {{\left( {a^{2} + b^{2}} \right) \cdot \begin{pmatrix} 1 & 0 \\ 0 & 1 \end{pmatrix}} = {\left\lbrack {\det\begin{pmatrix} a & b \\ {- b} & a \end{pmatrix}} \right\rbrack \cdot \begin{pmatrix} 1 & 0 \\ 0 & 1 \end{pmatrix}}}}$ and similarly

${\begin{pmatrix} a & b \\ {- b} & a \end{pmatrix}^{T} \cdot \begin{pmatrix} a & b \\ {- b} & a \end{pmatrix}} = {\left\lbrack {\det\begin{pmatrix} a & b \\ {- b} & a \end{pmatrix}} \right\rbrack \cdot {\begin{pmatrix} 1 & 0 \\ 0 & 1 \end{pmatrix}.}}$

A real matrix C is called “orthogonal” if CC^(T)=C^(T)C=1, and the set of (n×n) real orthogonal matrices is denoted O(n). O(n) is a group under multiplication. A real matrix C is “extended orthogonal” if it satisfies the more general rule CC ^(T) =C ^(T) C=r·1 for some rε

and the set of (n×n) extended orthogonal matrices is denoted ⁺O(n). Thus, O(n)⊂ ⁺O(n). Since nr=trace(r·1)=trace(CC^(T))≧0, where the trace of a matrix is the sum of the diagonal coefficients, r is necessarily non-negative and r=0

C=0. So ⁺O(n)−{0} forms a group under matrix multiplication.

If C is orthogonal, then det(C)²=det(C)det(C^(T))=det(CC^(T))=det(1)=1 so det(C)=±1. An orthogonal matrix with det(C)=1 is called “special orthogonal,” and the set of (n×n) special orthogonal matrices (which is also a group) is denoted SO(n).

Analogously, an extended orthogonal matrix C is defined to be “special extended orthogonal” if det(C)≧0 and denote the set of special extended orthogonal matrices by S⁺O(n). Again SO(n)⊂S⁺O(n) and S⁺O(n)−{0} forms a group under multiplication.

It is observed that CεS⁺O(n) if and only if C=0 or (det(C)>0 and

$\frac{1}{\,^{n}\sqrt{\det(C)}}$ CεSO(n)). This implies that every CεS⁺O(n) has a unique representation C=sR, sε

,s≧0, RεSO(n) and conversely. In particular, SO(n)={CεS⁺O(n)|det(C)=1}.

It can also be shown that a (2×2) real matrix C is special extended orthogonal if and only if it is of the form:

${C = \begin{pmatrix} a & b \\ {- b} & a \end{pmatrix}},a,{b \in R},$ which are precisely the matrices with which

represents. Thus this representation of

is denoted by the S⁺O(2) representation.

In particular, the unit circle S¹={(x₁,x₂)ε

²; x₁ ²+x₂ ²=1}≈{zε

; |z|²=1} is isomorphic to the real rotation group SO(2) by means of the representation

.

Instead of representing i by

$\begin{pmatrix} 0 & 1 \\ {- 1} & 0 \end{pmatrix},$ it could be represented by

$\begin{pmatrix} 0 & {- 1} \\ 1 & 0 \end{pmatrix},$ and nothing in the arithmetic would differ. This is precisely the same phenomenon as in linear algebra in which it is more satisfactory in an abstract sense to define vector spaces merely by the laws they satisfy but in which computation is best performed in coordinate form by selecting some arbitrary basis.

A three-component analog of complex numbers (i.e., “triplets”) provides a useful arithmetic structure on three-dimensional space, just as the complex numbers put a useful arithmetic structure on two-dimensional space. The theory of addition and scalar multiplication for triplets, are as follows: (a,b,c)+(d,e,ƒ)=(a+d,b+e,c+ƒ) s·(a,b,c)=(s·a,s·b,s·c)

However, multiplying triplets is more difficult. Two ways of multiplication exist: dot product, cross product (i.e., vector product). The dot product (or the scalar product) is as follows: (a,b,c)·(d,e,ƒ)=ad+be+cƒ However, this product does not produce a triplet.

The other way is known as the cross product is as follows: (a,b,c)×(d,e,ƒ)=(bƒ−ce,cd−aƒ,ae−bd).

The cross product has the advantage of producing a triplet from a pair of triplets, but fails to allow division. When A,B are triplets, the equation A×X=B is generally not solvable for X even when A≠0. However, the cross product contained the seed of the eventual solution in the anti-commutative law A×B=−B×A.

It is noted that three-dimensional space must be supplemented with a fourth temporal or scale dimension in order to form a complete system. Thus, 3-dimensional geometry must be embedded inside a (3+1)-dimensional geometry in order to have enough structure to allow certain types of objects (points at infinity, reciprocals of triplets, etc.) to exist.

The four-component objects named “quaternions,” have the usual addition and scalar multiplication laws. The definition of quaternion multiplication is as follows: (a,b,c,d)·(e,ƒ,g,h)=(ae−bƒ−cg−dh,aƒ+be+ch−dg,ag+ce+dƒ−bh,ah+bg+de−cƒ)

Because of the complexity, this formula is not used for computation.

As with the representation of complex numbers as couplets, the first step is to define the units: 1=(1,0,0,0) I=(0,1,0,0) J=(0,0,1,0) K=(0,0,0,1)

The previous formula then shows that I,J,K satisfy the multiplication rules: I ² =J ² =K ² =IJK=−1.

From these relations follow the permutation laws: IJ=−JI=K JK=−KJ=I KI=−IK=J and since 1a+Ib+Jc+Kd=(a,b,c,d)=a1+bI+cJ+cK, the usual laws of arithmetic combined with the above relations among the units defines quaternion multiplication completely. The quaternions is denoted as

.

A quaternion has many representations, the most basic being the 4-vector form q=a1+bI+cJ+cK. Typically, the “1” is omitted (or identified with the number 1 where no ambiguity will result): q=a+bI+cJ+cK.

q=a+bI+cJ+cK naturally decomposes into its scalar part Sc(q)=aε

and its vector (or principal) part Vc(q)=(bI+cJ+dK)ε

³, where the quaternion units I,J,K are regarded as unit vectors in

³ forming a right-hand orthogonal basis.

q=Sc(q)+Vc(q) always holds. The expression, q=a+{right arrow over (ν)}, is used to indicate Sc(q)=a and Vc(q)={right arrow over (ν)}. This can be referred to as the (3+1)-vector representation of a quaternion.

The addition and scalar multiplication laws in the (3+1) form are simply (a+{right arrow over (ν)})+(b+{right arrow over (w)})=(a+b)+({right arrow over (ν)}+{right arrow over (w)}) s·(a+{right arrow over (ν)})=(s·a+s·{right arrow over (ν)}),sε

However, the quaternion multiplication law in (3+1) form reveals the deep connection to the structure of three-dimensional space: (a+{right arrow over (ν)})·(b+{right arrow over (w)})=(ab−{right arrow over (ν)}•{right arrow over (w)})+(a{right arrow over (w)}+b{right arrow over (ν)})+({right arrow over (ν)}×{right arrow over (w)}).

In the above expression, {right arrow over (ν)}•{right arrow over (w)} denotes dot product (cI+dJ+eK)═(ƒI+gJ+hK)=(cƒ+dg+eh) while {right arrow over (ν)}×{right arrow over (w)} denotes cross product

${\left( {{cI} + {dJ} + {eK}} \right) \times \left( {{fI} + {gJ} + {hK}} \right)} = {{\begin{matrix} c & f & I \\ d & g & J \\ e & h & K \end{matrix}} = {{\left( {{dh} - {eg}} \right)I} + {\left( {{ef} - {ch}} \right)J} + {\left( {{cg} - {df}} \right){K.}}}}$

Since ab is ordinary scalar multiplication and a{right arrow over (w)},b{right arrow over (ν)} are just ordinary multiplications of a vector by a scalar, it can be seen that quaternion multiplication contains within it all four ways in which a pair of (3+1)-vectors can be multiplied.

It is suggestive that if the two relativistic spacetime intervals (Δx₁,Δy₁,Δz₁,cΔt₁),(Δx₂,Δy₂,Δz₂,cΔt₂) is represented by the quaternions Δq ₁ =cΔt ₁+(Δx ₁)I+(Δy ₁)J+(Δz ₁)K, Δq ₂ =cΔt ₂+(Δx ₂)I+(Δy ₂)J+(Δz ₂)K then Sc(Δq ₁ ·Δq ₂)=c ²(Δt ₁ Δt ₂)−(Δx ₁ Δx ₂ +Δy ₁ Δy ₂ +Δz ₁ Δz ₂), the familiar Minkowski scalar product.

The (3+1) product formula also shows that for any pure vector {right arrow over (ν)}, {right arrow over (ν)}²=−|{right arrow over (ν)}|²ε

. In particular, when {circumflex over (ν)} is an ordinary unit vector in 3-space, {circumflex over (ν)}²=−1, which generalizes the rules for I,J,K.

As with the complex numbers, quaternions have a conjugation operation q*: q*=(a+bI+cJ+dK)*=(a−bI−cJ−dK).

In (3+1) form this is (a+{right arrow over (ν)})*=(a−{right arrow over (ν)}). Generalizing the

-formulae

${\left( z^{*} \right)^{*} = z},{{{Re}(z)} = {\frac{1}{2}\left( {z + z^{*}} \right)}},{{i\mspace{11mu}{{Im}(z)}} = {\frac{1}{2}\left( {z - z^{*}} \right)}}$ yields the following: (q*)*=q

$\begin{matrix} {{S\;{c(q)}} = {\frac{1}{2}{\left( {q + q^{*}} \right).}}} \\ {{V\;{c(q)}} = {\frac{1}{2}\left( {q - q^{*}} \right)}} \end{matrix}$

Quaternions also have a norm generalizing the complex |z|=√{square root over (zz*)}: |q|=√{square root over (qq*)}=√{square root over (q*q)}=√{square root over ((a ² +b ² +c ² +d ²))}ε

and, as with

, |q|²≧0 and (|q|=0

q=0). In (3+1) form the norm is calculated by |a+{right arrow over (ν)}|=√{square root over (a²+{right arrow over (ν)}•{right arrow over (ν)})}.

A unit quaternion is defined to be a uε

such that |u|=1. It is noted that the quaternion units ±1,±I,±J,±K are all unit quaternions.

The chief peculiarity of quaternion arithmetic is the failure of the commutative law: for quaternions q,r, whereby generally q·r≠r·q; even the units do not commute: I·J=−J·I, etc. The (3+1) form (a+{right arrow over (ν)})·(b+{right arrow over (w)})=(ab−{right arrow over (ν)}•{right arrow over (w)})+(a{right arrow over (w)}+b{right arrow over (ν)})+({right arrow over (ν)}×{right arrow over (w)}) shows this most clearly. All the multiplication operations in this expression are commutative except the cross product {right arrow over (ν)}×{right arrow over (w)} which satisfies {right arrow over (ν)}×{right arrow over (w)}=−{right arrow over (w)}×{right arrow over (ν)}, hence is the source of non-commutativity. This also shows that if Vc(q) and Vc(r) are parallel vectors in

³ then q·r=r·q.

An important formula is the anti-commutative conjugate law (q·r)*=r*·q* which is most easily proved in the (3+1) form. Combined with the previous law (q*)*=q, this shows that conjugation is an anti-involution of

.

Recall that the reciprocal of a non-zero complex number z can be written in the form

$z^{- 1} = \frac{z^{*}}{{z}^{2}}$ and this also holds for quaternions:

${q^{- 1} = \frac{q^{*}}{{q}^{2}}},{q \neq 0}$ as is apparent by the calculation

${q\left( \frac{q^{*}}{{q}^{2}} \right)} = {\frac{q\; q^{*}}{{q}^{2}} = {\frac{{q}^{2}}{{q}^{2}} = 1}}$ and similarly for

$\left( \frac{q^{*}}{{q}^{2}} \right){q.}$

As with all non-commutative groups, inverses anti-commute (q≠0,r≠0)

((qr)⁻¹ =r ⁻¹ q ⁻¹).

So

possesses the four basic arithmetic operations but has a non-commutative multiplication, which is the definition of what is called a division ring.

A known result of Frobenius states that the only division rings which are finite-dimensional extensions of

are

itself (one-dimensional), the complex numbers

(two-dimensional), and the quaternions

((3+1)-dimensional). This is another example of the exceptional properties of (3+1)-dimensional space.

The (n×n) identity matrix

$\quad\begin{pmatrix} 1 & 0 & \cdots & \; & 0 \\ 0 & ⋰ & \; & \; & \vdots \\ \vdots & \; & 1 & \; & \vdots \\ \vdots & \; & \; & ⋰ & 0 \\ 0 & \; & \cdots & 0 & 1 \end{pmatrix}$ is denoted 1 to avoid confusion with the quaternion unit I.

There are many notations for the quaternion units; e.g., i,j,k; î,ĵ,{circumflex over (k)}; and I,J,K. A more general definition of the quaternions, based on is obtained as follows:

Let

be a commutative field and e,ƒ,gε

−{0}.

(

,e,ƒ,g), the quaternions over

, is defined as the smallest

-algebra which contains elements I,J,Kε

(

,e,ƒ,g) satisfying the relations I ² =−eƒ,J ² =−eg,K ² =−ƒg,IJK=−eƒg.

It can then be shown that IJ=−JI=eK JK=−KJ=gI. KI=−IK=ƒJ

Any qε

(

,e,ƒ,g) can be written uniquely in the form q=a+bI+cJ+dK, a,b,c,dε

with conjugate q*=a−bI−cJ−dK and norm ²|q|=a²+eƒb²+egc²+ƒgd².

An interesting situation is when the quadratic form w²+eƒx²+egy²+ƒgz² over

is definite; i.e., (w²+eƒx²+egy²+ƒgz²=0)

(w=x=y=z=0). In particular, for this to hold, none of −eƒ,−eg,−ƒg can be squares in

. In this case,

(

,e,ƒ,g) is a division ring as well as a four-dimensional

-algebra.

(

,1,1,1)=

are just Hamilton's quaternions.

In order to show that

(

,e,ƒ,g) exists, it is noted that the typical polynomial algebra constructions fail because the non-commutativity of the quaternion units.

Let

be a

-algebra, then the tensor algebra of

over

is the graded

-algebra

${T_{k}(A)} = {\coprod\limits_{n \geq 0}\left( {A \otimes_{k}\mspace{11mu}\ldots\mspace{11mu} \otimes_{k}A} \right)_{n\mspace{11mu}{factors}}}$ with product defined on basis elements by (a ₁ {circumflex over (x)} . . . {circle around (x)}a _(m))×(b ₁ {circle around (x)} . . . {circle around (x)} b _(n))=(a ₁ {circle around (x)} . . . {circle around (x)}a _(m) {circle around (x)} b ₁ {circle around (x)} . . . {circle around (x)} b _(n)).

It is noted (

{circle around (x)}_(k) . . . {circle around (x)}_(k)

)_(0 factors)=

by definition.

For e,ƒ,gε

−{0}, define the quaternion

-algebra

(

,e,ƒ,g) to be

${{{\mathbb{H}}\left( {{??},e,f,g} \right)} = \frac{T_{??}\left( {??}^{3} \right)}{\Theta\left( {{??},e,f,g} \right)}},$ where, defining I=(1,0,0),J=(0,1,0),K=(0,0,1), Θ(

,e,ƒ,g) is the two-sided ideal generated by eƒ+{circle around (x)}I eg+J{circle around (x)}J ƒg+K{circle around (x)}K eƒg+I{circle around (x)}J{circle around (x)}K

The quaternion units {±1,±I,±J,±K} form a non-abelian group

of order 8 under multiplication. By expressing

as {1,1′,I,I′,J,J′,K,K′}, then the quaternions over any commutative field

can be abstractly represented as the quotient

(

)=

[

]/Θ, where

[

] is the group ring and Θ is the two-sided ideal generated by 1+1′,I+I′,J+J′,K+K′.

There are many extensions

⊃

which are fields. For example, the field of formal quotients

$\frac{a_{0} + {a_{1}x} + \ldots + {a_{n}x^{n}}}{b_{0} + {b_{1}x} + \ldots + {b_{m}x^{m}}},$ a₀, a₁, . . . , a_(n), b₀, b₁, . . . , b_(m)ε

. However, Frobenius' Theorem asserts that none of these can be finite-dimensional as vector spaces over

.

Just as there are S⁺O(2) representations for the complex numbers, there are comparable representations for the quaternions. These are especially important because there are certain procedures, such as extracting the eigenstructure of quaternion matrices, which are nearly impossible except in these representations.

It is noted that an (n×n) complex matrix Q is called unitary if QQ*=Q*Q=1. Q* denotes the conjugate transpose also called the hermitian conjugate (which is sometimes denoted Q^(H)):

$\begin{pmatrix} z_{11} & \; & \cdots & \; & z_{1n} \\ \; & ⋰ & \; & \ddots & \; \\ \vdots & \; & z_{i\; j} & \; & \vdots \\ \; & \ddots & \; & ⋰ & \; \\ z_{n1} & \; & \cdots & \; & z_{n\; n} \end{pmatrix}^{*} = {\begin{pmatrix} z_{11}^{*} & \; & \cdots & \; & z_{n1}^{*} \\ \; & ⋰ & \; & \ddots & \; \\ \vdots & \; & z_{i\; j}^{*} & \; & \vdots \\ \; & \ddots & \; & ⋰ & \; \\ z_{1n}^{*} & \; & \cdots & \; & z_{n\; n}^{*} \end{pmatrix}.}$

It is noted when Q is real, Q*=Q^(T). The group of (n×n) unitary matrices is denoted U(n). Thus O(n)⊂U(n).

As with the orthogonal matrices, a complex matrix Q is termed “extended unitary” if the more general rule QQ*=Q*Q=r·1,rε

holds and denote the (n×n) extended unitary matrices by ⁺U(n). So ⁺O(n)∪⁺U(n)⊂ ⁺U(n) and ⁺U(n)−{0} is a group under multiplication.

A unitary matrix Q is special unitary if det(Q)=1 and analogously an extended unitary matrix Q is special extended unitary if det(Q)≧0. The special extended unitary matrices are denoted S⁺U(n); thus, (S⁺O(n)∪SU(n))⊂S⁺U(n), and S⁺U(n)−{0} is a group under multiplication.

As with S⁺O(n), it is straight forward to calculate that QεS⁺U(n) if and only if Q=0 or (det(Q)ε

,det(Q)>0 and

$\frac{1}{\sqrt[n]{\det(Q)}}$ QεSU(n)). This implies that every QεS⁺U(n) has a unique representation Q=sU, sε

,s≧0, UεSU(n) and conversely.

It can be shown that a (2×2) complex matrix Q is special extended unitary if and only if it is of the form:

${Q = \begin{pmatrix} z_{+} & z_{-} \\ {- z_{-}^{*}} & z_{+}^{*} \end{pmatrix}},z_{+},{z_{-} \in {C.}}$

Defining

${{〚\;{z_{+} + {z_{-}J}}\; 〛} = \begin{pmatrix} z_{+} & z_{-} \\ {- z_{-}^{*}} & z_{+}^{*} \end{pmatrix}},$ it can be shown, using the laws of quaternion arithmetic in the bicomplex representation, that

converts all the algebraic operations in

into matrix operations.

is called the S⁺U(2) representation.

Moreover, the S⁺U(2) representation sends conjugation to hermitian conjugation and the squared norm to the determinant:

${{{{〚\left( {z_{+} + {z_{-}J}} \right)^{*}〛} = {{〚{z_{+}^{*} - {z_{-}J}}〛} = {\begin{pmatrix} z_{+}^{*} & {- z_{-}} \\ z_{-}^{*} & z_{+} \end{pmatrix} = {\begin{pmatrix} z_{+} & z_{-} \\ {- z_{-}^{*}} & z_{+}^{*} \end{pmatrix}^{*} = {〚{z_{+} + {z_{-}J}}〛}^{*}}}}}}z_{+}} + {z_{-}J{^{2} = }z_{+}{^{2} + }z_{-}{^{2}{= {{\det\begin{pmatrix} z_{+} & z_{-} \\ {- z_{-}^{*}} & z_{+}^{*} \end{pmatrix}} = {{\det 〚{z_{+} + {z_{-}J}}〛}.}}}}}$ In particular, the unit 3-sphere S ³={(x ₁ ,x ₂ ,x ₃ ,x ₄)ε

⁴ ;x ₁ ² +x ₂ ² +x ₃ ² +x ₄ ²=1}≈{qε

;|q| ²=1} is isomorphic to the spin group SU(2) by means of the representation

.

The unit quaternions {qε

; |q|²=1} is denoted

⊂

. In terms of the (3+1) form of quaternions, the S⁺U(2) representation is

${〚\;{a + {bI} + {cJ} + {cK}}\; 〛} = {\begin{pmatrix} {a + {bi}} & {c + {di}} \\ {{- c} + {di}} & {a - {bi}} \end{pmatrix}.}$

Decomposing the matrix

a+bI+cJ+cK

yields

$\begin{matrix} {{〚\;{a + {bI} + {cJ} + {cK}}\; 〛} = \begin{pmatrix} {a + {bi}} & {c + {di}} \\ {{- c} + {di}} & {a - {bi}} \end{pmatrix}} \\ {= {{a\begin{pmatrix} 1 & 0 \\ 0 & 1 \end{pmatrix}} + {b\begin{pmatrix} i & 0 \\ 0 & {- i} \end{pmatrix}} + {c\begin{pmatrix} 0 & 1 \\ {- 1} & 0 \end{pmatrix}} + {d\begin{pmatrix} 0 & i \\ i & 0 \end{pmatrix}}}} \end{matrix}$ and, thus,

${{〚\; 1\; 〛} = \begin{pmatrix} 1 & 0 \\ 0 & 1 \end{pmatrix}},{{〚\; I\; 〛} = \begin{pmatrix} i & 0 \\ 0 & {- i} \end{pmatrix}},\;{\quad{{{〚J\; 〛} = \begin{pmatrix} 0 & 1 \\ {- 1} & 0 \end{pmatrix}},{{〚\; K\; 〛} = {\begin{pmatrix} 0 & i \\ i & 0 \end{pmatrix}.}}}}$

The above are denoted as the standard units of the S⁺U(2) representation.

It is also easy to extend the S⁺U(2) representation to m×n quaternion matrices component wise:

${〚\begin{pmatrix} \; & \vdots & \; & \; \\ \ldots & a_{ij} & \ldots & \ldots \\ \; & \vdots & \; & \; \end{pmatrix}〛} = {\begin{pmatrix} \; & \vdots & \; & \; \\ \ldots & {〚a_{ij}〛} & \ldots & \ldots \\ \; & \vdots & \; & \; \end{pmatrix}.}$

This representation will preserve all the additive and multiplicative properties of quaternion matrices.

Assuming {circumflex over (α)}ε

³ is a unit vector and θε

be an angle, then the quaternion is defined as follows:

$u = {{u\left( {\theta,\hat{\alpha}} \right)} = {{\cos\;\frac{\theta}{2}} + {\left( {\sin\;\frac{\theta}{2}} \right){\hat{\alpha}.}}}}$

For all vectors {right arrow over (ν)}ε

³, the quaternion product u{right arrow over (ν)}u* is also a vector and is the right-handed rotation of {right arrow over (ν)} about the axis {circumflex over (α)} by angle θ. It is noted u(θ,{circumflex over (α)}) is always a unit quaternion; i.e., u(θ,{circumflex over (α)})ε

.

This result has found uses in, for example, computer animation and orbital mechanics because it reduces the work required to compound rotations: a series of rotations (θ₁,{circumflex over (α)}₁), . . . ,(θ_(k),{circumflex over (α)}_(k)) can be represented by the quaternion product u(θ_(k),{circumflex over (α)}_(k)) . . . u(θ₁,{circumflex over (α)}₁) which is much more efficient to compute than the product of the associated rotation matrices. Moreover, by inverting the map (θ,{circumflex over (α)})

(θ,{circumflex over (α)}) the resultant angle and axis of this series of rotations can be calculated: (θ_(net),{circumflex over (α)}_(net))=u ⁻¹ [u(θ_(k),{circumflex over (α)}_(k)) . . . u(θ₁,{circumflex over (α)}₁)], which is simpler than computing the eigenstructure of the product rotation matrix.

If q=a+{right arrow over (ν)} is an arbitrary quaternion and uε

then uqu*=u(a+{right arrow over (ν)})u*=auu*+u{right arrow over (ν)}u*=a+u{right arrow over (ν)}u* so that rotation by u leaves Sc(q) unchanged. In particular, when qε

, uqu*=q so rotation leaves

⊂

invariant. Thus ulu*=1.

Also u(q+r)u*=uqu*+uru* u(qr)u*=u(q(u*u)r)u*=(uqu*)(uru*) (uqu*=r)

(q=u*ru).

The conclusion is that the rotation map q

(uqu*) is an algebraic automorphism of

i.e., a structure-preserving one-to-one correspondence.

Assuming {right arrow over (u)},{right arrow over (ν)} are non-parallel vectors of the same length, then there is at least one rotation of

³ which sends {right arrow over (u)} to {right arrow over (ν)}. Any unit vector {circumflex over (α)} which lies on the plane of points which are equidistant from the tips of {right arrow over (u)},{right arrow over (ν)} can be used as an axis for a rotation which sends {right arrow over (u)} to {right arrow over (ν)}.

As {right arrow over (u)} is rotated around one of these axes, the tip of {right arrow over (u)} moves in a circle which lies in the sphere centered at the origin and passing through the tips of {right arrow over (u)},{right arrow over (ν)}. Generally this is a small circle on this sphere. However, there are two unit vectors {circumflex over (α)} around which the tip of {right arrow over (u)} moves in a great circle; namely

${\hat{\alpha} = {\pm \frac{\overset{\rightarrow}{u} \times \overset{\rightarrow}{v}}{{\overset{\rightarrow}{u} \times \overset{\rightarrow}{v}}}}},$ the unique unit vectors perpendicular to both {right arrow over (u)} and {right arrow over (ν)}.

When rotated around such {circumflex over (α)}, the tip of {right arrow over (u)} moves along either the longest or shortest path between the tips depending on the orientations. In either case, this path is an external of the length of the paths. Any unit vector around which {right arrow over (u)} can be rotated into {right arrow over (ν)} along an external path is referred to as an “external unit vector.” Clearly {circumflex over (α)} is an external unit vector, then so is −{circumflex over (α)}.

When {right arrow over (u)}={right arrow over (ν)}≠{right arrow over (0)}, the external vectors are

$\hat{\alpha} = {\pm \frac{\overset{\rightarrow}{u}}{\overset{\rightarrow}{u}}}$ since any rotation fixing {right arrow over (u)} must have the line containing {right arrow over (u)} as an axis. When {right arrow over (u)}=−{right arrow over (ν)}≠{right arrow over (0)}, the external vectors are all unit vectors in the plane perpendicular to {right arrow over (u)}. When {right arrow over (u)}={right arrow over (ν)}={right arrow over (0)}, the external vectors are all unit vectors.

Now, it is assumed that {circumflex over (α)},{circumflex over (β)},{circumflex over (γ)} and {circumflex over (α)}′,{circumflex over (β)}′,{circumflex over (γ)}′ are two right-handed, orthonormal systems of vectors: {circumflex over (α)}⊥{circumflex over (β)}, |{circumflex over (α)}|={circumflex over (β)}|=1, {circumflex over (γ)}={circumflex over (α)}×{circumflex over (β)} and similarly for {circumflex over (α)}′,{circumflex over (β)}′,{circumflex over (γ)}′. To simplify the analysis, that it is further assumed that {circumflex over (α)},{circumflex over (α)}′ are not parallel and {circumflex over (β)},{circumflex over (β)}′ are not parallel.

As discussed above, all the rotations sending {circumflex over (α)} to {circumflex over (α)}′ determine a plane and similarly for the rotations sending {circumflex over (β)} to {circumflex over (β)}′. Assuming these planes are not the same, they will intersect in a line through the origin. There is then a unique rotation around this line (and only around this line) which will simultaneously send {circumflex over (α)} to {circumflex over (α)}′ and {circumflex over (β)} to {circumflex over (β)}′. Since {circumflex over (γ)}={circumflex over (α)}×{circumflex over (β)} and {circumflex over (γ)}′={circumflex over (α)}′×{circumflex over (α)}′, this rotation also sends {circumflex over (γ)} to {circumflex over (γ)}′.

By carefully analyzing the various cases when parallelism occurs, the following can be shown:

Proposition 1 For any two right-handed, orthonormal systems of vectors {circumflex over (α)},{circumflex over (β)},{circumflex over (γ)} and {circumflex over (α)}′,{circumflex over (β)}′,{circumflex over (γ)}′, there is a unit quaternion uε

such that {circumflex over (α)}′=u{circumflex over (α)}u* {circumflex over (β)}′=u{circumflex over (β)}u*, {circumflex over (γ)}′=u{circumflex over (α)}u* Moreover, u is unique up to sign: ±u will both work.

The sign ambiguity is easy to understand:

$u = {{u\left( {\theta,\hat{\alpha}} \right)} = {{\cos\frac{\theta}{2}} + \left( {\sin\frac{\theta}{2}} \right)}}$ {circumflex over (α)} is the rotation around {circumflex over (α)} by angle θ while

$\begin{matrix} {{- u} = {{{- \cos}\frac{\theta}{2}} - {\left( {\sin\frac{\theta}{2}} \right)\hat{\alpha}}}} \\ {= {{\cos\left( \frac{{2\;\pi} - \theta}{2} \right)} + {{\sin\left( \frac{{2\;\pi} - \theta}{2} \right)}\left( {- \hat{\alpha}} \right)}}} \\ {= {u\left( {\left( {{2\;\pi} - \theta} \right),{- \hat{\alpha}}} \right)}} \end{matrix}$ is the rotation around −{circumflex over (α)} by angle (2π−θ). However, these are geometrically identical operations.

Because of the automorphism properties, if uε

and the following is defined I′=uIu* J′=uJu* K′=uKu* then the relations I′ ² =J′ ² =K′ ² =I′J′K′=−1 I′J′=K′,J′K′=I′,K′I′=J′ will hold. This means the new units I′,J′,K′ are algebraically indistinguishable form the old units I,J,K.

Therefore, any right-handed, orthonormal system of unit vectors can function as the quaternion units.

As a result of this, neither the bicomplex nor the S⁺U(2) representations are unique. For example, it was mentioned previously that any of the maps (a+bi)

(a+bI) (a+bi)

(a+bJ) (a+bi)

(a+bK) could be used to define a distinct embedding

⊂

hence induces a distinct bicomplex representation of

.

All of these arise by cyclically permuting the units: I,J,K→J,K,I→K,I,J which can be accomplished by the rotation quaternion

$u = {\frac{1}{\sqrt{3}}{\left( {I + J + K} \right).}}$ In fact, there are exactly 24 different right-hand systems that can be selected from {±I,±J,±K}, any of which can function as a quaternion basis, and all of which are obtained by some rotation quaternion of the form

$u = {\frac{1}{\sqrt{3}}{\left( {{I \pm J} \pm K} \right).}}$

In other words, if UεSU(2), then

$\begin{matrix} {{〚\;{a + {b\; I} + {c\; J} + {c\; K}}\; 〛}_{U} = {{a\begin{pmatrix} 1 & 0 \\ 0 & 1 \end{pmatrix}} + {b\left\lbrack {{U\begin{pmatrix} i & 0 \\ 0 & {- i} \end{pmatrix}}U^{*}} \right\rbrack} +}} \\ {{c\left\lbrack {U\begin{pmatrix} 0 & 1 \\ {- 1} & 0 \end{pmatrix}U^{*}} \right\rbrack} +} \\ {d\left( {U\begin{pmatrix} 0 & i \\ i & 0 \end{pmatrix}U^{*}} \right\rbrack} \end{matrix}$ is a valid S⁺U(2) representation.

This illustrates the additional richness of the quaternions over the complex numbers: the only non-trivial

-invariant automorphism of

is complex conjugation but

has a distinct automorphism for each unit {±u}⊂

.⁶

Assuming a is an n×n matrix over

. a is called normal if it commutes with its conjugate: aa*=a*a. Important classes of normal matrices include the following:

Hermitian (or symmetric or self-adjoint): a*=a

Anti-hermitian (or anti-symmetric): a*=−a

Unitary (or orthogonal): a*=a⁻¹

Non-negative: a=bb* for some b

Semi-positive: a is non-negative and a≠0

A projection: a²=a*=a

It is a classic result that any normal matrix a can be diagonalized by a unitary matrix; that is, there is a unitary matrix u and a diagonal matrix

$\lambda = \begin{pmatrix} \lambda_{1} & \; & \; & \; \\ \; & \lambda_{2} & \; & \; \\ \; & \; & ⋰ & \; \\ \; & \; & \; & \lambda_{n} \end{pmatrix}$ such that u*au=λ.

λ₁, λ₂, . . . , λ_(n)ε

are the eigenvalues of a and the columns of u form an orthonormal basis for

^(n) with the inner product

$\left\langle {\overset{\rightarrow}{x},\overset{\rightarrow}{y}} \right\rangle = {\sum\limits_{v}{x_{v}{y_{v}^{*}.}}}$

The standard normal classes can be characterized by the properties of λ₁, λ₂, . . . , λ_(n):

Hermitian

λ₁, λ₂, . . . , λ_(n)ε

Anti-hermitian

${\frac{1}{i}\lambda_{1}},{\frac{1}{i}\lambda_{2}},\;\ldots\mspace{11mu},{\frac{1}{i}\lambda_{n}},{\in R}$

Unitary

|λ₁|=|λ₂|= . . . =|λ_(n)|=1

Non-negative

λ₁, λ₂, . . . , λ_(n)ε

and λ₁, λ₂, . . . , λ_(n)≧0

Semi-positive

λ₁, λ₂, . . . , λ_(n)ε

and for some ν, λ_(ν)>0.

A projection

λ₁, λ₂, . . . , λ_(n)ε{0,1}

In particular, it is noted that any real normal matrix aε

^(n×n) will generally have complex eigenvalues and eigenvectors. In the special case that a is symmetric (a^(T)=a), a can be diagonalized by a real orthogonal matrix and has real diagonal entries.

The first step in quaternion modeling is to generalize this result to

; i.e., to show that any normal quaternion matrix a can be diagonalized by a unitary quaternion matrix. In fact, it can be shown that the eigenvalues are in

⊂

. This latter fact is important because it means the characteristic polynomial p_(a)(λ)=det(λ1−a) need not be discussed, which, as mentioned above, is badly behaved over

. This also implies that the same classification of the normal types based on the properties of λ₁, λ₂, . . . , λ_(n)ε

works for quaternion matrices as well.

This can be regarded as the Fundamental Theorem of quaternions because it has so many important consequences. In particular, in the case n=1, this will yield the polar representation of a quaternion, which is the basis for quaternion spatial modeling.

As pointed out above, parts of standard linear algebra do not work over

. However, linear independence and the properties of span( ) in

^(n) work the same way as in

^(n) except that the left scalar multiplication needs to be distinguished from the right scalar multiplication. Because

is a division ring, the following lemmas result:

Lemma 1 Let {right arrow over (w)}, {right arrow over (ν)}₁, . . . , {right arrow over (ν)}_(l)ε

^(n) and suppose {{right arrow over (ν)}₁, . . . , {right arrow over (ν)}_(l)} is linearly independent but {{right arrow over (w)}, {right arrow over (ν)}₁, . . . , {right arrow over (ν)}_(l)} is linearly dependent, then {right arrow over (w)}_(k)εspan({right arrow over (ν)}₁, . . . , {right arrow over (ν)}_(l)).

Lemma 2 Let {right arrow over (w)}₁, . . . , {right arrow over (w)}_(k), {right arrow over (ν)}₁, . . . , {right arrow over (ν)}_(l)ε

^(n) such that {right arrow over (w)}₁, . . . , {right arrow over (w)}_(k)εspan({right arrow over (ν)}₁, . . . , {right arrow over (ν)}_(l)) and k>l, then {{right arrow over (w)}₁, . . . , {right arrow over (w)}_(k)} is linearly dependent.

These lemmas imply all the usual results concerning bases and dimension including the fact that any linearly independent set can be extended to a basis for

^(n).

The inner product yields:

$\left\langle {\overset{\rightarrow}{x},\overset{\rightarrow}{y}} \right\rangle = {\left\langle {\begin{pmatrix} \begin{matrix} x_{1} \\ \vdots \end{matrix} \\ x_{n} \end{pmatrix},\begin{pmatrix} \begin{matrix} y_{1} \\ \vdots \end{matrix} \\ y_{n} \end{pmatrix}} \right\rangle = {\underset{v = 1}{\sum\limits^{n}}{x_{v}y_{v}^{*}}}}$ which satisfies the usual properties of the inner product over

^(n) including

{right arrow over (x)},{right arrow over (x)}

=0

({right arrow over (x)}=0) and

q{right arrow over (x)},{right arrow over (y)}

=q·

{right arrow over (x)},{right arrow over (y)}

, qε

. Perpendicularity is defined by ({right arrow over (x)}⊥{right arrow over (y)})

{right arrow over (x)},{right arrow over (y)}

=0.

Lemma 3 (Projection Theorem for

) Let {right arrow over (ν)}₁, . . . , {right arrow over (ν)}_(l)ε

^(n), then for all {right arrow over (w)}ε

^(n), there exist q₁, . . . q_(l)ε

and a unique {right arrow over (e)}ε

^(n) such that {right arrow over (w)}q₁{right arrow over (ν)}_(l)+ . . . q_(l){right arrow over (ν)}_(l)+{right arrow over (e)} and {right arrow over (e)}⊥{right arrow over (ν)}₁, . . . , {right arrow over (ν)}_(l). If {{right arrow over (ν)}₁, . . . , {right arrow over (ν)}_(l)} is linearly independent, then q₁, . . . q_(l) are also unique.

Using the Projection Theorem, it can be shown that

^(n) has an orthonormal basis and, in fact, any orthonormal set {{right arrow over (ν)}₁, . . . , {right arrow over (ν)}_(l)} can be extended to an orthonormal basis.

The matrix u of change-of-basis to any orthonormal set is unitary and thus the matrix g of any linear operator

n ⁢ -> G ⁢ n is transformed to ugu* by the basis change.

Let

$a = \begin{pmatrix} a & b \\ c & d \end{pmatrix}$ be a 2×2 matrix over

. Define the matrix

$a^{\diamond} = \begin{pmatrix} d^{*} & {- c^{*}} \\ {- b^{*}} & a^{*} \end{pmatrix}$ and suppose

${{a\begin{pmatrix} u \\ v \end{pmatrix}} = \begin{pmatrix} x \\ y \end{pmatrix}},{{{then}\mspace{14mu}{a^{\diamond}\begin{pmatrix} {- v^{*}} \\ u^{*} \end{pmatrix}}} = {{\begin{pmatrix} d^{*} & {- c^{*}} \\ {- b^{*}} & a^{*} \end{pmatrix}\begin{pmatrix} {- v^{*}} \\ u^{*} \end{pmatrix}} = {\begin{pmatrix} {- \left( {{cu} + {dv}} \right)^{*}} \\ \left( {{a\; u} + {bv}} \right)^{*} \end{pmatrix} = {\begin{pmatrix} {- y^{*}} \\ x^{*} \end{pmatrix}.}}}}$ Next it is noted that for any

${\begin{pmatrix} z_{+} & z_{-} \\ {- z_{-}^{*}} & z_{+}^{*} \end{pmatrix} \in {S^{+}{U(2)}}},{\begin{pmatrix} z_{+} & z_{-} \\ {- z_{-}^{*}} & z_{+}^{*} \end{pmatrix}^{0} = {\begin{pmatrix} \left( z_{+}^{*} \right)^{*} & {- \left( {- z_{-}^{*}} \right)^{*}} \\ {- \left( z_{-} \right)^{*}} & \left( z_{+} \right)^{*} \end{pmatrix} = {\begin{pmatrix} z_{+} & z_{-} \\ {- z_{-}^{*}} & z_{+}^{*} \end{pmatrix}.}}}$ Thus, the following lemma results:

Lemma 4 Let qε

and

$\begin{pmatrix} u \\ v \end{pmatrix},{\begin{pmatrix} x \\ y \end{pmatrix} \in C^{2}}$ such that

${{{〚\; q\; 〛}\begin{pmatrix} u \\ v \end{pmatrix}} = \begin{pmatrix} x \\ y \end{pmatrix}},$ then

${{〚\; q\; 〛}\begin{pmatrix} {- v^{*}} \\ u^{*} \end{pmatrix}} = {\begin{pmatrix} {- y^{*}} \\ x^{*} \end{pmatrix}.}$

It is noted that this result is independent of which form of

is used. However, the next result requires selecting a specific form:

Proposition 2 It is assumed that a be an n×n quaternion matrix and {right arrow over (w)}ε

^(2n)−{{right arrow over (0)}} is an eigenvector of the standard representation

a

with eigenvalue λε

, {right arrow over (w)} can be written in the form

$\overset{\rightarrow}{w} = {\begin{pmatrix} u_{1} \\ v_{1} \\ \vdots \\ u_{n} \\ v_{n} \end{pmatrix}.}$ Also, λε

can be identified with λε

by replacing iε

by Iε

; then

${a\begin{pmatrix} {u_{1} - {Jv}_{1}} \\ \vdots \\ {u_{n} - {Jv}_{n}} \end{pmatrix}} = {\begin{pmatrix} {u_{1} - {Jv}_{1}} \\ \vdots \\ {u_{n} - {Jv}_{n}} \end{pmatrix} \cdot {\lambda.}}$ Writing

a

and

$\overset{\rightarrow}{w} = \begin{pmatrix} u_{1} \\ v_{1} \\ \vdots \\ u_{n} \\ v_{n} \end{pmatrix}$ in blocks as

${{〚\; a\; 〛} = {{\begin{pmatrix} \mspace{56mu} & \; & \mspace{56mu} \\ \; & {〚\; a_{kl}〛} & \; \\ \; & \; & \; \end{pmatrix}\mspace{14mu}{and}\mspace{14mu}\overset{->}{w}} = \begin{pmatrix} u_{1} \\ v_{1} \\ \; \\ \; \\ u_{n} \\ v_{n} \end{pmatrix}}},$ the equation a{right arrow over (w)}={right arrow over (w)}λ is seen to be

${{\sum\limits_{l = 1}^{n}{{〚\; a_{kl}〛}\begin{pmatrix} u_{l} \\ v_{l} \end{pmatrix}}} = {{\begin{pmatrix} u_{k} \\ v_{k} \end{pmatrix}\lambda} = \begin{pmatrix} {u_{k}\lambda} \\ {v_{k}\lambda} \end{pmatrix}}},$ k=1, . . . , n.

By Lem. 3,

${{{{\sum\limits_{l = 1}^{n}{{〚\; a_{kl}\; 〛}\begin{pmatrix} {- v_{l}^{*}} \\ u_{l}^{*} \end{pmatrix}}} = {\begin{pmatrix} {{- v_{k}^{*}}\lambda^{*}} \\ {u_{k}^{*}\lambda^{*}} \end{pmatrix} = {\begin{pmatrix} {- v_{k}^{*}} \\ u_{k}^{*} \end{pmatrix} \cdot \lambda^{*}}}},{k = 1},\cdots\mspace{11mu},{\left. n\Rightarrow{\sum\limits_{l = 1}^{n}{{〚\; a_{kl}\; 〛}\begin{pmatrix} u_{l} & {- v_{l}^{*}} \\ v_{l} & u_{l}^{*} \end{pmatrix}}} \right. = {\begin{pmatrix} u_{k} & {- v_{k}^{*}} \\ v_{k} & u_{k}^{*} \end{pmatrix} \cdot \begin{pmatrix} \lambda & 0 \\ 0 & \lambda^{*} \end{pmatrix}}},{k = 1},\cdots\mspace{11mu},{n.{However}},{\begin{pmatrix} u_{l} & {- v_{l}^{*}} \\ v_{l} & u_{l}^{*} \end{pmatrix} = {\begin{pmatrix} u_{l} & \left( {- v_{l}^{*}} \right) \\ {- \left( {- v_{l}^{*}} \right)^{*}} & u_{l}^{*} \end{pmatrix} = {{u_{l}} + {\left( {- v_{l}^{*}} \right)J}}}}}〛} = {{{〚\;{u_{l} - {J\; v_{l}}}〛}\mspace{14mu}{{and}\begin{pmatrix} \lambda & 0 \\ 0 & \lambda^{*} \end{pmatrix}}} = {{〚\;{\lambda + {0J}}\; 〛} = {〚\;\lambda\; 〛}}}$ in the standard representation.

Therefore

${{\sum\limits_{l = 1}^{n}{a_{kl}\left( {u_{l} - {J\; v_{l}}} \right)}} = {{\left( {u_{k} - {J\; v_{k}}} \right) \cdot \lambda}\mspace{14mu}{in}}}\mspace{14mu}$ $\left. H\Rightarrow{a\begin{pmatrix} {u_{1} - {J\; v_{1}}} \\ \vdots \\ {u_{n} - {J\; v_{n}}} \end{pmatrix}} \right. = {{\begin{pmatrix} {u_{1} - {J\; v_{1}}} \\ \vdots \\ {u_{n} - {J\; v_{n}}} \end{pmatrix} \cdot \lambda}\mspace{14mu}{in}\mspace{14mu}{H^{\underset{\_}{w}}.}}$

It is noted that this proposition shows that if column vectors are used to represent

^(n) then “eigenvalue” must be taken to mean “right eigenvalue”.

Proposition 3 (The Fundamental Theorem): Let a be an n×n normal matrix over

, then there exists an n×n unitary matrix u over

and a diagonal matrix

$\lambda = \begin{pmatrix} \lambda_{1} & \; & \; & \; \\ \; & \lambda_{2} & \; & \; \\ \; & \; & ⋰ & \; \\ \; & \; & \; & \lambda_{n} \end{pmatrix}$ with λ₁, λ₂, . . . , λ_(n)ε

such that u*au=λ. λ is unique up to permutations of the diagonal coefficients.

Let a be normal. Since every matrix over

^(2n) has an eigenvector, Prop. 2 implies that a has an eigenvector {right arrow over (y)}ε

^(n)−{{right arrow over (0)}} with eigenvalue λ₁ε

. By the corollaries to the Projection Theorem, {right arrow over (γ)} can be extended to an orthogonal basis for

^(n). In this basis, a becomes

${{u_{1}^{*}a\; u_{1}} = \begin{pmatrix} \lambda_{1} & q_{2} & \cdots & q_{n} \\ 0 & \; & \; & \; \\ \vdots & \; & a^{\prime} & \; \\ 0 & \; & \; & \; \end{pmatrix}},$ where u₁ is unitary. This matrix is also normal and since

$\begin{matrix} {\begin{matrix} {\begin{pmatrix} \lambda_{1} & q_{2} & \cdots & q_{n} \\ 0 & \; & \; & \; \\ \vdots & \; & a^{\prime} & \; \\ 0 & \; & \; & \; \end{pmatrix}^{*} \cdot} \\ \begin{pmatrix} \lambda_{1} & q_{2} & \cdots & q_{n} \\ 0 & \; & \; & \; \\ \vdots & \; & a^{\prime} & \; \\ 0 & \; & \; & \; \end{pmatrix} \end{matrix} = {\begin{pmatrix} \lambda_{1}^{*} & 0 & \cdots & 0 \\ q_{2}^{*} & \; & \; & \; \\ \vdots & \; & \left( a^{\prime} \right)^{*} & \; \\ q_{n}^{*} & \; & \; & \; \end{pmatrix} \cdot \begin{pmatrix} \lambda_{1} & q_{2} & \cdots & q_{n} \\ 0 & \; & \; & \; \\ \vdots & \; & a^{\prime} & \; \\ 0 & \; & \; & \; \end{pmatrix}}} \\ {{= \begin{pmatrix} {\lambda_{1}}^{2} & {\lambda_{1}^{*}q_{2}} & \cdots & {\lambda_{1}^{*}q_{n}} \\ {q_{2}^{*}\lambda_{1}} & \; & \; & \; \\ \vdots & \; & b & \; \\ {q_{n}^{*}\lambda_{1}} & \; & \; & \; \end{pmatrix}},} \end{matrix}$ for some b, and

$\begin{matrix} {\begin{matrix} {\begin{pmatrix} \lambda_{1} & q_{2} & \cdots & q_{n} \\ 0 & \; & \; & \; \\ \vdots & \; & a^{\prime} & \; \\ 0 & \; & \; & \; \end{pmatrix} \cdot} \\ \begin{pmatrix} \lambda_{1} & q_{2} & \cdots & q_{n} \\ 0 & \; & \; & \; \\ \vdots & \; & a^{\prime} & \; \\ 0 & \; & \; & \; \end{pmatrix}^{*} \end{matrix} = {\begin{pmatrix} \lambda_{1} & q_{2} & \cdots & q_{n} \\ 0 & \; & \; & \; \\ \vdots & \; & a^{\prime} & \; \\ 0 & \; & \; & \; \end{pmatrix} \cdot \begin{pmatrix} \lambda_{1}^{*} & {\; 0\mspace{14mu}} & {\cdots\mspace{11mu}} & {0\;} \\ q_{2}^{*} & \; & \; & \; \\ \vdots & \; & \left( a^{\prime} \right)^{*} & \; \\ q_{n}^{*} & \; & \; & \; \end{pmatrix}}} \\ {{= \begin{pmatrix} {{\lambda_{1}}^{2} + {\sum\limits_{v = 2}^{n}{q_{v}}^{2}}} & {\mspace{40mu} r_{2}\mspace{31mu}} & {\cdots\mspace{14mu}} & {\mspace{31mu} r_{n}\mspace{34mu}} \\ r_{2}^{*} & \; & \; & \; \\ \; & \; & \; & \; \\ \vdots & \; & {a^{\prime}\left( a^{\prime} \right)}^{*} & \; \\ \; & \; & \; & \; \\ r_{n}^{*} & \; & \; & \; \end{pmatrix}},} \end{matrix}$ for some r₂, . . . , r_(n), by equating the corner coefficients, the following is obtained:

${\sum\limits_{v = 2}^{n}{q_{v}}^{2}} = {\left. 0\Rightarrow{{\left( {q_{2} = {\cdots = {q_{n} = 0}}} \right).\mspace{14mu}{Thus}}\mspace{14mu} u_{1}^{*}a\; u_{1}} \right. = \begin{pmatrix} \lambda_{1} & 0 & \cdots & 0 \\ 0 & \; & \; & \; \\ \vdots & \; & a^{\prime} & \; \\ 0 & \; & \; & \; \end{pmatrix}}$ and a′ is normal.

Continuing in the same way on a′, yields,

$\begin{matrix} {{u^{*}a\; u} = {\left( {u_{n}\mspace{11mu}\cdots\mspace{14mu} u_{1}} \right)\mspace{11mu}{A\left( {u_{n}\mspace{11mu}\cdots\mspace{14mu} u_{1}} \right)}^{*}}} \\ {= {u_{n}\mspace{11mu}\cdots\mspace{14mu} u_{1}a\; u_{1}^{*}\mspace{11mu}\cdots\mspace{14mu} u_{n}^{*}}} \\ {= \begin{pmatrix} \lambda_{1} & 0 & \cdots & 0 \\ 0 & ⋰ & \; & \vdots \\ \vdots & \; & ⋰ & 0 \\ 0 & \cdots & 0 & \lambda_{n} \end{pmatrix}} \end{matrix}$ with u=u_(n) . . . u₁ unitary and λ₁, λ₂, . . . , λ_(n)ε

.

The Fundamental Theorem not only establishes the existence of the diagonalization but, when combined with Prop. 1, yields a method for constructing it.

With respect to eigenvalue degeneracy, an (n×n) matrix over a commutative division ring (i.e., a field) can have at most n eigenvalues because its characteristic polynomial can have at most n roots. However, this is no longer true over non-commutative division rings as the following consequence of the Fundamental Theorem shows.

First, let a be an (n×n) normal quaternion matrix and define Eig(a) to be the eigenvalues of a in

.

is identified with the subfield of

by regarding i=I in the usual manner. A set of complex numbers λ₁, λ₂, . . . , λ_(m)ε

∩Eig(a) is defined to be “eigen-generators” for a if they satisfy the following: (i) λ₁, λ₂, . . . , λ_(m) are all distinct; (ii) no pair λ_(k),λ_(l), are complex conjugates of one another; and (iii) the list λ₁, λ₂, . . . , λ_(m)ε

∩Eig(a) cannot be extended without violating (i) or (ii).

Proposition 4 Let a be an (n×n) normal quaternion matrix, then at least one set of eigen-generators λ₁, λ₂, . . . , λ_(m)ε

∩Eig(a) with 1≦m≦n exists. If λ₁, λ₂, . . . , λ_(m)ε

∩Eig(a) is one such, then a quaternion με

is an eigenvalue of a if and only if for some 1≦k≦m, μ=Re(λ_(k))+Im(λ_(k))û, where ûε

³ with |û|=1. Moreover, k is unique and if με

then û is unique as well.

Corollary 1 If μ is a quaternion eigenvalue of a, then so is μ* and qμq⁻¹ for any qε

−{0}.

Corollary 2 If λ₁, λ₂, . . . , λ_(m)ε

∩Eig(a), λ₁′, λ₂′, . . . , λ_(m′)′ε

∩Eig(a) are two sets of eigen-generators then m′=m, 1≦m≦n, and λ₁′, λ₂′, . . . , λ_(m)′ is a permutation of λ₁ ^((±*)), λ₂ ^((±*)), . . . , λ_(m) ^((±*)), where λ^((±*)) denotes exactly one of λ,λ*.

Corollary 3 There is at least one, but no more than n, distinct elements of

∩Eig(a).

Turning now to a discussion of Hermitian-regular rings and compact projections, it is assumed that X is a left A-module, and Y,Z⊂X are submodules. The smallest submodule of X which includes both Y and Z is denoted Y+Z. It is evident that Y+Z={y+z; yεY,zεZ}.

An important special case of this construction is when the following two conditions hold:

(i) Y∩Z={0}

(ii) X=Y+Z.

In this case, every xεX has a unique decomposition of the form x=y+z,yεY,zεZ. The existence is clear by (ii). As for uniqueness, if y+z=x=y′+z′, then y−y′=z′−z and since Y,Z are submodules, then y−y′εY and z′−zεZ, so y−y′=z′−zεY∩Z={0}. Therefore, y=y′ and z=z′ as stated.

When (i) and (ii) hold, then X=Y⊕Z in which X denotes the “(internal) direct sum” of Y,Z.

Now assuming A is a *-algebra and X has a definite inner product on it, a stronger condition on the pair Y, Z is considered; namely:

(i′) Y⊥Z

by which is meant every yεY is perpendicular to every xεX. Clearly (i′) implies (i) since if xεY∩Z with Y⊥Z, then x⊥x so x=0 since the inner product is definite.

When (i′) and (ii) hold, then X=Y⊕^(⊥)Z, which is referred to as an “orthogonal decomposition or projection” of X onto Y (or Z).

Thus, (X=Y⊕^(⊥)Z)

(X=Y⊕Z), but the converse usually does not hold.

For any submodule Y, the following is defined: Y ^(⊥) ={yεY;(∀xεX)(y⊥x)}.

Clearly Y^(⊥) is a submodule of X and Y⊥Y^(⊥). Subsequently, some conditions under which X=Y⊕^(⊥)(Y^(⊥)) (i.e., when X=Y+Y^(⊥)) are examined, as these conditions are key to the Levinson algorithm. First, the converse is examined.

Proposition 5 Let X=Y⊕_(⊥)Z, then

-   -   (i) Z=Y^(⊥) and Y=Z^(⊥)     -   (ii) Y^(⊥⊥)==Y and Z^(⊥⊥)=Z.

As discussed above, it is not generally the case that X=Y+Y^(⊥) where Y⊂X are modules with a definite inner product. There are well-understood stood situations, however, when this does hold so that X=Y⊕Y^(⊥). For example, in the case of an

or

vector space which has a metric completeness property like a Banach or Hilbert space, X=Y⊕Y^(⊥) will hold for every subspace Y which is topologically closed. In particular, this will hold for every finite-dimensional subspace Y because finite-dimensional subspaces are always topologically closed. This latter finite result, in fact, holds for any division ring D, not merely D=

,

. Any finite-dimensional subspace Y⊂X of a D-vector space has an orthogonal basis and from that orthogonal basis an orthogonal projection X=Y⊕Y^(⊥) may be constructed.

Such finite orthogonal projections are required for the Levinson algorithm because they correspond precisely to minimum power residuals in finite-lag, multi-channel linear prediction. This leads to the following definition:

Let A be a *-algebra. An A-module X is said to “admit compact projections” if for every f.g. submodule Y⊂X, the following exists: X=Y⊕Y^(⊥).

It is noted that if X admits compact projections, then every submodule Y⊂X which is of the form Y=Z^(⊥) for some f.g. submodule Z will also satisfy X=Y⊕Y^(⊥) because by Prop. 5, Y^(⊥)=Z^(⊥⊥)=Z so Y⊕Y^(⊥)=Z^(⊥)⊕Z=X. However it is not generally the case that if Y⊂X satisfies Y^(⊥) is f.g, then X=Y⊕Y^(⊥) because for this result, it is required that Y=Y^(⊥⊥), which generally does not hold.

Further, A itself can be defined to admit compact projections if every A-module X with definite inner product admits compact projections. For example, the results above show that every division ring admits compact projections.

The next step is to find a generalization of division rings for which this property continues to hold.

A pseudo-inverse of a scalar aεA is a a′εA such that aa′aεa. A ring A is called regular if every element has a pseudo-inverse. Clearly if aεA has an inverse a⁻¹ then a⁻¹ is a pseudo-inverse: aa⁻¹a=1a=a. However, many scalars have pseudo-inverses that are not units; for example, for any bεA, 0b0=0 so b is a pseudo-inverse of 0. This also shows that pseudo-inverses are not unique.

Regular rings can be easily constructed. For example, if {D_(ν); νεN} is a set of division rings, then

$\prod\limits_{v}D_{v}$ is a regular ring because a pseudo-inverse of

$\left( a_{v} \right) \in {\prod\limits_{v}D_{v}}$ can be defined by the rule

$a_{v}^{\prime} = \left\{ {\begin{matrix} {a_{v}^{- 1},} & {{{if}\mspace{14mu} a_{v}} \neq 0} \\ {0,} & {{{if}\mspace{14mu} a_{v}} = 0} \end{matrix}.^{2}} \right.$

However, regular rings are too special; generalization of this concept is needed. It is assumed that A is a *-algebra, in which

is a subset of A, wherein A is defined to be

-regular if every aε

has a pseudo-inverse.

Normal-regular, hermitian-regular, and semi-positive-regular rings are of particular interest.

An “idempotent” is an eεA for which e²=e. It is noted that a projection, as previously defined, is an hermitian idempotent. A is “indecomposable” if 0,1 are the only idempotents in A.

Proposition 6:

-   -   (i) Let A be a definite *-algebra. If A⁺ ⊂unit(A) then A is a         division ring. If, in addition, A⁺ ⊂Z(A), then A is normal.     -   (ii) An indecomposable, definite, semi-positive-regular         *-algebra is a division ring. If, in addition, A⁺ ⊂Z(A), then A         is normal.

Corollary VII.1 Let A be a symmetric algebra, then

(A) is a field and A is a normal division ring which is a

(A)*-algebra.

Proposition 7 (The Projection Theorem) Every hermitian regular ring admits compact projections. The following formulation can be used to calculate the projection coefficients. It is assumed that A be a hermitian regular ring and X a left A-module with definite inner product <,>, and that Y⊂X be a finitely generated submodule. Accordingly, the following needs to be proved: X=Y+Y^(⊥).

If Y={0} then Y^(⊥)=X so the result is trivial. So assume Y=span_(A)(y₁, . . . , y_(n)), n≧1. The result may be proved by induction on n, as follows.

For n=1:

Let xεX. Since ²|y₁|εA is hermitian and A is hermitian regular, ²|y₁| has a pseudo-inverse (²|y₁|)′. Define e=x−(

x,y ₁

(² |y ₁|)′)·y ₁, then xεspan_(A)(y₁)+span_(A)(e) so it is sufficient to show that y₁⊥e.

e,y₁

=

x,y₁

−

x,y₁

·²|y₁|′·²|y₁|=

x,y₁

·p=

x,p*·y₁

, where p=1−²|y₁|′·²|y₁|. So it is sufficient to show that p*·y₁=0.

$\begin{matrix} {{\,^{2}{{p^{*} \cdot y_{1}}}} = \left\langle {{p^{*} \cdot y_{1}},{p^{*} \cdot y_{1}}} \right\rangle} \\ {= {p^{*} \cdot {\,^{2}{y_{1}}} \cdot p}} \\ {= {p^{*} \cdot {\,^{2}{y_{1}}} \cdot \left( {1 - {{{}_{}^{}{y_{1}}_{}^{}} \cdot {\,^{2}{y_{1}}}}} \right)}} \\ {= {p^{*} \cdot \left( {{\,^{2}{y_{1}}} - {{\,^{2}{y_{1}}} \cdot {{}_{}^{}{y_{1}}_{}^{}} \cdot {\,^{2}{y_{1}}}}} \right)}} \\ {= {p^{*} \cdot \left( {{\,^{2}{y_{1}}} - {\,^{2}{y_{1}}}} \right)}} \\ {= {p^{*} \cdot 0}} \\ {= 0.} \end{matrix}$ <,> is definite so p*·y₁=0.

Let n≧2 and assume the result holds for n:

Let Y=span_(A)(y₁, . . . , y_(n), y_(n+1)) and xεX. By the inductive hypothesis applied twice, scalars a₁, . . . a_(n), b₁, . . . b_(n)εA and e,ƒεX are found such that x=a ₁ y ₁ + . . . +a _(n) y _(n) +e,e⊥y ₁ , . . . ,y _(n) y _(n+1) =b ₁ y ₁ + . . . +b _(n) y _(n) +ƒ,ƒ⊥y ₁ , . . . ,y _(n) Also by the n=1 case, e=aƒ+ē,ē⊥ƒ. Then

$\begin{matrix} {x = {{a_{1}y_{1}} + \cdots + {a_{n}y_{n}} + e}} \\ {= {{a_{1}y_{1}} + \cdots + {a_{n}y_{n}} + {\alpha\; f} + \overset{\_}{e}}} \\ {= {{a_{1}y_{1}} + \cdots + {a_{n}y_{n}} + {\alpha\left( {y_{n + 1} - {b_{1}y_{1}} - \cdots - {b_{n}y_{n}}} \right)} + \overset{\_}{e}}} \\ {= {{\left( {a_{1} - {\alpha\; b_{1}}} \right)y_{1}} + \cdots + {\left( {a_{n} - {\alpha\; b_{n}}} \right)y_{n}} + {\alpha\; y_{n + 1}} + \overset{\_}{e}}} \end{matrix}$ so it sufficient to show ē⊥y₁, . . . , y_(n), y_(n+1).

Both e,ƒ⊥y₁, . . . , y_(n) so ē=(e−αƒ)⊥y₁, . . . , y_(n).

But, then

y_(n+1),ē

=b₁

y₁,ē

+ . . . +b_(n)

y_(n),ē

+

ƒ,ē

=0 by definition of ē so ē⊥y_(n+1) also.

By induction, the result holds for all n≧1.

Prop. VII.3.b (Constructive Form of the Projection Theorem) Let A be a hermitian regular ring and X a left A-module with definite inner product <,>. Let y₁, y₂, . . . εX be a (possibly infinite) sequence of elements. To project xεX onto y₁, y₂, . . . , the following is noted.

For n=0: x=0+e⁽⁰⁾, where e⁽⁰⁾=x.

$\begin{matrix} {{{For}\mspace{14mu} n} = {1:}} & {x = {{a_{1}^{(1)} \cdot y_{1}} + {e^{(1)}\mspace{14mu}{where}\mspace{14mu}\left\{ \begin{matrix} {a_{1}^{(1)} = \left( {\left\langle {x,y_{1}} \right\rangle\mspace{14mu}{{}_{}^{}{y_{1}}_{}^{}}} \right)} \\ {e^{(1)} = {x - {a_{1}^{(1)} \cdot y_{1}}}} \end{matrix} \right.}}} \end{matrix}$ and ²|y₁|′ is a pseudo-inverse of the hermitian element ²|y₁|.

For n+1, n≧1, the following projections onto n generators result:

-   -   (i) Project x onto y₁, y₂, . . . , y_(n):         x−a ₁ ^((n)) ·y ₁ + . . . a _(n) ^((n)) ·y _(n) +e ^((n)) ,e         ^((n)) ⊥y ₁ , . . . ,y _(n).     -   (ii) Project y_(n+1) onto y₁, y₂, . . . , y_(n):         y _(n+1) =b ₁ ^((n)) ·y ₁ + . . . b _(n) ^((n)) ·y         _(n)+ƒ^((n)),ƒ^((n)) ⊥y ₁ , . . . ,y _(n).     -   (iii) Project e^((n)) onto ƒ^((n)) using the n=1 case:         e ^((n))=α^((n))·ƒ^((n)) +ē ^((n)) ,ē ^((n))⊥ƒ^((n)).     -   (iv) Then

$\begin{matrix} {\begin{pmatrix} a_{1}^{({n + 1})} \\ \vdots \\ a_{n}^{({n + 1})} \\ a_{n + 1}^{({n + 1})} \end{pmatrix} = {\begin{pmatrix} a_{1}^{(n)} \\ \vdots \\ a_{n}^{(n)} \\ 0 \end{pmatrix} - {\alpha^{(n)} \cdot {\begin{pmatrix} b_{1}^{(n)} \\ \vdots \\ b_{n}^{(n)} \\ {- 1} \end{pmatrix}.}}}} \\ {e^{({n + 1})} = {\overset{\_}{e}}^{(n)}} \end{matrix}$

It is noted that if A is a field and every finite subset of y₁, y₂, . . . εX is linearly independent, then the coefficients a₁ ^((n))({right arrow over (y)},x), . . . , a_(n) ^((n))({right arrow over (y)},x)εA are unique. However, generally this will not hold; only the decomposition x=[a₁ ^((n))({right arrow over (y)},x)·y₁+ . . . +a_(n) ^((n))({right arrow over (y)},x)·y_(n)]+e^((n))({right arrow over (y)},x) itself is unique.

It is apparent that the class of

-regular rings is closed under direct products and quotients. However, it is difficult in general to infer

-regularity for the important class of matrix algebras

(n,n,A) from general assumptions concerning A.³ One method that applies to (3+1)-dimensional modeling is singular decomposition.

Singular decompositions are an abstract form of the singular value decompositions of ordinary matrix theory. Let

⊂A. Let aεA. A singular decomposition of a over

is an identity a=ubu⁻¹ where bε

and 3uεunit(A).

Lemma 5 Let A be

-regular where

⊂A. Let

⊂A and suppose every aε

has a singular decomposition over

, then A is

-regular.

Proposition 9. The matrix algebras

(n,n,

) and

(n,n,

) are normal regular; hence they are hermitian regular. The matrix algebra

(n,n,

) is symmetric regular. Hence it is hermitian regular.

Corollary 5 The matrix algebras

(n,n,D) for D=

,

,

admit compact projections.

Linear prediction is really a collection of general results of linear algebra. A discussion of the mapping of signals to vectors in such a way that the algorithm may be applied to optimal prediction is more fully described below.

According to the Yule-Walker Equations:

Let A be a *-algebra and Rε

((M+1),(M+1),A), M≧0. R is a toeplitz matrix if it has the form

${R = \begin{pmatrix} r_{0} & r_{1} & r_{2} & \cdots & \cdots & r_{M} \\ r_{- 1} & r_{0} & r_{1} & ⋰ & \; & \; \\ r_{- 2} & ⋰ & ⋰ & ⋰ & ⋰ & \; \\ \vdots & ⋰ & ⋰ & ⋰ & ⋰ & r_{2} \\ r_{{- M} + 1} & \; & ⋰ & ⋰ & ⋰ & r_{1} \\ r_{- M} & r_{{- M} + 1} & \cdots & r_{- 2} & r_{- 1} & r_{0} \end{pmatrix}};$ that is, using 0-based indexing, (∀0≦k,l≦M)(R_(k,l)=r_(l−k)). An hermitian toeplitz matrix must thus have the form

$R = \begin{pmatrix} r_{0} & r_{1} & r_{2} & \cdots & \cdots & r_{M} \\ r_{1}^{*} & r_{0} & r_{1} & ⋰ & \; & \; \\ r_{2}^{*} & ⋰ & ⋰ & ⋰ & ⋰ & \; \\ \vdots & ⋰ & ⋰ & ⋰ & ⋰ & r_{2} \\ r_{M - 1}^{*} & \; & ⋰ & ⋰ & ⋰ & r_{1} \\ r_{M}^{*} & r_{M - 1}^{*} & \cdots & r_{2}^{*} & r_{1}^{*} & r_{0} \end{pmatrix}$ so r_(−k)=r_(k)*. It is noted, in particular, that r₀ must be an hermitian scalar.

When R is toeplitz and no confusion will result, the following notation is used: (R_(k,l)=R_(l−k)). M is called the “order” of R.

Let R be a fixed hermitian toeplitz matrix of order M over scalars A. Yule-Walker parameters for R are scalars a ₁ , . . . ,a _(M),(²σ),b ₀ , . . . ,b _(M−1),(²τ)εA satisfying the Yule-Walker equations

$\begin{matrix} \begin{Bmatrix} {{\sum\limits_{m = 0}^{M}{a_{m}R_{p - m}}} = {{\,^{2}\sigma} \cdot \delta_{p}}} \\ {{\sum\limits_{m = 0}^{M}{b_{m}R_{p - m}}} = {{\,^{2}\tau} \cdot \delta_{M - p}}} \end{Bmatrix} & {{p = 0},\cdots\mspace{11mu},M,} \end{matrix}$ where a₀=b_(M)=1 is defined, and δ is the Kronecker delta function

$\delta_{p} = \left\{ {\begin{matrix} {1;{p = 0}} \\ {0;{p \neq 0}} \end{matrix}.} \right.$

It is noted that no claim concerning existence or uniqueness of a₁, . . . , a_(M), (²σ), b₀, . . . , b_(M−1), (²τ)εA is implied. Also the notation ²σ,²τ does not imply that these parameters are hermitian (although there are important cases in which the hermitian property holds).

The scalars a₁, . . . , a_(M), ²τ are called the “forward” parameters and b₀, . . . , b_(M−1), ²τ are the “backwards” parameters. The definitions a₀=b_(M)=1 always is made without further comment.

When M=0, the Yule-Walker parameters are simply ²σ,²τ and the Yule-Walker equations reduce to ²σ=a₀R₀=b₀R₀=²τ. This is one case in which it can be concluded that ²σ,²τ are hermitian scalars.

Lemma 6 (The γ Lemma) Let a₁, . . . , a_(M), (²σ), b₀, . . . , b_(M−1), (²τ)εA be Yule-Walker parameters for R. Define

$\gamma = {\sum\limits_{m = 0}^{M}\;{\sum\limits_{k = 0}^{M}\;{a_{m}R_{k - m + 1}{b_{k}^{*}.}}}}$

Then,

$\gamma = \left\{ {\begin{matrix} {\sum\limits_{m = 0}^{M}{a_{m}R_{M - m + 1}}} \\ {\sum\limits_{m = 0}^{M}{R_{m + 1}b_{m}^{*}}} \end{matrix}.} \right.$

Let X be a left A-module with inner product. A (possibly infinite) sequence x₀, x₁, . . . , x_(M), . . . εX is called toeplitz if (∀m≧n≧0) the inner product

x_(n),x_(m)

depends only on the difference m−n.

For such a sequence, the autocorrelation sequence R_(m)=R_(m)(x₀, x₁, . . . )εA, mε

can be defined by

$R_{m} = \left\{ \begin{matrix} {\left\langle {x_{0},x_{m}} \right\rangle;{m \geq 0}} \\ {\left\langle {x_{m},x_{0}} \right\rangle;{m < 0}} \end{matrix} \right.$ and, then:

$\left\{ {\begin{matrix} {\left( {\forall{m \in Z}} \right)\left( {R_{- m} = R_{m}^{*}} \right)} \\ {\left( {{\forall m},{n \in Z}} \right)\left( {R_{m - n} = \left\langle {x_{n},x_{m}} \right\rangle} \right)} \end{matrix}.} \right.$

This means that if R^((M))=R^((M))(x₀, x₁, . . . )ε

((M+1),(M+1),A), M≧0 is defined by the rule R _(n,m) ^((M)) =R _(m−n),0≦m,n≦M, then R^((M)) is an hermitian toeplitz matrix of order M over A.

An autocorrelation matrix (of order M) can be defined to be an hermitian toeplitz matrix R^((M)) which derives from a toeplitz sequence x₀, x₁, . . . , x_(M), . . . εX as above.

Thus, R^((M)) is just the Gram matrix of the vectors x₀, x₁, . . . , x_(M).

Now assume further that the inner product on X is definite and that X admits compact projections.

Accordingly, for any M≧0, X=span_(A)(x₀, . . . , x_(M))⊕^(⊥)(span_(A)(x₀, . . . , x_(M)))^(⊥) since X admits compact projections; and so there are scalars a₁ ^((M)), . . . , a, (²σ^((M))), b₀ ^((M)), . . . , b_(M−1) ^((M)), (²τ^((M)))εA and unique vectors e^((M)),ƒ^((M))εX satisfying the following:

$\quad\left\{ \begin{matrix} {{x_{0} = {{- {\sum\limits_{m = 1}^{M}\;{a_{m}^{(M)}x_{m}}}} + e^{(M)}}},{e^{(M)}\bot x_{1}},\cdots\mspace{11mu},x_{M}} \\ {{x_{M} = {{- {\sum\limits_{m = 0}^{M - 1}{b_{m}^{(M)}x_{m}}}} + f^{(M)}}},{f^{(M)}\bot x_{0}},\cdots\mspace{11mu},{x_{M - 1}.}} \\ {{{{}_{}^{}{}_{}^{(M)}} = {e^{(M)}}},{{{}_{}^{}{}_{}^{(M)}} = {\,^{2}{f^{(M)}}}}} \end{matrix} \right.$ a₁ ^((M)), . . . , a_(M) ^((M)), (²σ^((M))), b₀ ^((M)), . . . , b_(M−1) ^((M)), (²τ^((M)))εA is referred to as “Levinson parameters” of order M and the defining relations the “Levinson relations (or the Levinson equations).”

It is noted that since e^((M)),ƒ^((M)) are unique, so are ²σ^((M)),²τ^((M)). The coefficients a₁ ^((M)), . . . , a_(M) ^((M)), b₀ ^((M)), . . . , b_(M−1) ^((M)) are unique if x₀, x₁, . . . , x_(M) are linearly independent over A but this can only happen in the single-channel situation so that a₁ ^((M)), . . . , a_(M) ^((M)), b₀ ^((M)), . . . , b_(M−1) ^((M)) is regarded as non-unique unless explicitly stated. However, the vectors

${\left\lbrack {- {\sum\limits_{m = 1}^{M}\;{a_{m}^{(M)}x_{m}}}} \right\rbrack \in X},{\left\lbrack {- {\sum\limits_{m = 0}^{M - 1}{b_{m}^{(M)}x_{m}}}} \right\rbrack \in X}$ are always unique.

Defining a₀ ^((M))=b_(M) ^((M))=1, the Levinson equations can be written

$\left\{ {\begin{matrix} {{{\sum\limits_{m = 0}^{M}\;{a_{m}^{(M)}x_{m}}} = e^{(M)}},{e^{(M)}\bot x_{1}},\cdots\mspace{11mu},x_{M}} \\ {{{\sum\limits_{m = 0}^{M}{b_{m}^{(M)}x_{m}}} = f^{(M)}},{f^{(M)}\bot x_{0}},\cdots\mspace{11mu},x_{M - 1}} \end{matrix}.} \right.$

For M=0, the Levinson parameters are just ²σ^((M)),²τ^((M)) and the Levinson relations are

$\left\{ {\begin{matrix} {e^{(0)} = {{a_{0}^{(0)}x_{0}} = {x_{0} = {{b_{0}^{(0)}x_{0}} = f^{(0)}}}}} \\ {{{}_{}^{}{}_{}^{(0)}} = {{\,^{2}{x_{0}}} = {{}_{}^{}{}_{}^{(0)}}}} \end{matrix}.} \right.$

The scalars a₁ ^((M)), . . . , a_(M) ^((M)) are called the forward filter, b₀, . . . , b_(M−1), the backwards filter, e^((M)),ƒ^((M)) the forwards and backwards residuals, and ²|e^((M))|,²|ƒ^((M))| the forwards and backwards errors. The definitions a₀=b_(M)=1 will always be made without further comment.

Lemma 7 Let x₀, x₁, . . . , x_(M), . . . εX be a toeplitz sequence in the A-module X, where X has a definite inner product and admits compact projections, then any set of Levinson parameters of order M for x₀, x₁, . . . , x_(M), . . . are Yule-Walker parameters for the autocorrelation matrix R^((M))(x₀, x₁, . . . , x_(M), . . . ) and conversely.

Hence the scalars ²σ,²τεA of sets of Yule-Walker parameters for R^((M)) are unique and hermitian.

Corollary 6 (The Backshift Lemma) Let a₁ ^((M)), . . . , a_(M) ^((M)), (²σ^((M))), b₀ ^((M)), . . . , b_(M−1) ^((M)), (²τ^((M)))εA be Levinson parameters for the toeplitz sequence x₀, x₁, . . . , x_(M), x_(M+1), . . . εX. Defining

${{\overset{\Cup}{f}}^{(M)} = {\sum\limits_{m = 0}^{M}{b_{m}^{(M)}{x_{m + 1}.}}}},$ then {hacek over (ƒ)}^((M))⊥x₁, . . . , x_(M) and ²τ^((M))=²|{hacek over (ƒ)}^((M))|.

The Levinson Algorithm is provides a fast way of extending Levinson parameters a₁ ^((M)), . . . , a_(M) ^((M)), (²σ^((M))), b₀ ^((M)), . . . , b_(M−1) ^((M)), (²τ^((M)))εA of order M for a toeplitz sequence x₀, x₁, . . . , x_(M), . . . εX to Levinson parameters a₁ ^((M+1)), . . . , a_(M−1) ^((M+1)), (²σ^((M+1))), b₀ ^((M+1)), . . . , b_(M) ^((M+1)), (²τ^((M+1)))εA of order (M+1).

This can be derived by using Lem. 7 to reduce the problem to the Yule-Walker equations, which can be put into the matrix form:

${\begin{pmatrix} 1 & a_{1}^{(M)} & \cdots & a_{M}^{(M)} \\ b_{0}^{(M)} & \cdots & b_{M - 1}^{(M)} & 1 \end{pmatrix}R^{(M)}} = {\begin{pmatrix} {{}_{}^{}{}_{}^{(M)}} & 0 & \cdots & 0 \\ 0 & \cdots & 0 & {{}_{}^{}{}_{}^{(M)}} \end{pmatrix}.}$

Moreover, the hermitian, toeplitz form of the autocorrelation matrices implies that R^((M+1)) can be blocked as both

$\begin{matrix} \; & \; & \; & \; & \; & {R^{({M + 1})} = \begin{pmatrix} \; & \; & \; & \; & R_{M + 1} \\ \; & R^{(M)} & \rightarrow & \; & R_{M} \\ \; & \downarrow & \; & \; & \vdots \\ \; & \; & \; & \; & R_{1} \\ R_{M + 1}^{*} & R_{M}^{*} & \cdots & R_{1}^{*} & R_{0} \end{pmatrix}} \\ {and} & \; & \; & \; & \; & \; \\ \; & \; & \; & \; & \; & {R^{({M + 1})} = {\begin{pmatrix} R_{0} & R_{1} & \cdots & R_{M} & R_{M + 1} \\ R_{1}^{*} & \; & \; & \; & \; \\ \vdots & \; & R^{(M)} & \rightarrow & \; \\ R_{M}^{*} & \; & \downarrow & \; & \; \\ R_{M + 1}^{*} & \; & \; & \; & \; \end{pmatrix}.}} \end{matrix}$

This also shows how the coefficient R_(M+1) adds the new information while passing from order to (M+1).

Simple manipulations on these matrix relations easily yield recursive formulae expressing a₁ ^((M+1)), . . . , a_(M+1) ^((M+1)), (²σ_((M+1))), b₀ ^((M+1)), . . . , b_(M) ^((M+1)), (²τ^((M+1))) in terms of a₁ ^((M)), . . . , a_(M) ^((M)), (²σ^((M))), b₀ ^((M)), . . . , b_(M−1) ^((M)), (²τ^((M))) and R_(M+1) with the proviso that ²σ^((M)) and ²τ^((M)) are invertible in A. This is the algorithmic meaning of non-singularity although in many cases it can be directly related to the non-singularity of the matrices R^((M)).

A good illustration of the general commutative, non-singular theory are the Szegö polynomials:

Let μ be a real measure on the unit circle, let A=

, and X be the complex functions whose singularities are contained in the interior of the unit circle (i.e., the z-transforms of causal sequences). For ƒ,gεX define

⟨f, g⟩_(μ) = ∫_(−π)^(π)f(𝕖^(𝕚 ω))g(𝕖^(𝕚 ω))^(*)𝕕μ(𝕖^(𝕚ω)). ²|ƒ|_(μ)=0 is clearly equivalent to ƒ=0 a.e.(μ) and there are a variety of assumptions that can be made about μ to ensure that, in this case, ƒ=0 identically. For example, if the set of points of discontinuity Δ(μ)={ω; μ{ω}>0} form a set of uniqueness for the trigonometric polynomials. Assuming that such a condition holds,

−,−

_(μ) is a definite inner product on X.

The sequence x₀, x₁, . . . , x_(M), . . . εX is defined simply as z⁰, z⁻¹, z⁻², . . . which is toeplitz because

⟨z^(−n), z^(−m)⟩_(μ) = ∫_(−π)^(π)𝕖^(−𝕚 n ω)(𝕖^(−𝕚 m ω))^(*)𝕕μ(ω) = ∫_(−π)^(π)𝕖^(𝕚 (m − n)ω)𝕕μ(ω) depends only on (m−n).

Once again, there are various analytic assumptions which can be made about μ which will imply that the autocorrelation matrices R_(μ) ^((M))ε

((M+1),(M+1),

) are non-singular. In such cases ²σ^((M)),²τ^((M))≠0; i.e., ²σ^((M)) and ²τ^((M)) are invertible in

.

Therefore, with appropriate analytic assumptions, the M-th order Szegö polynomials for the measure μ can be well-defined as the Levinson residuals e^(μ) ^((M))(z),ƒ_(μ) ^((M))(z) of the sequence z⁰, z⁻¹, z⁻², . . . .

e_(μ) ^((M))(z),ƒ_(μ) ^((M))(z) are M-th order polynomials (in z⁻¹) which are perpendicular to z⁻¹, z⁻², . . . , z^(−M) and 1, z⁻¹, . . . , z^(−M+1) respectively in the μ-inner product. These orthogonality properties make then extremely useful for certain signal processing tasks.

Once non-commutative scalars are introduced, for example, by passing to a multi-channel situation, the previous method breaks down for the reasons previously discussed: (i) multi-channel correlations introduce unremovable degeneracies in the autocorrelation matrices making them highly non-singular; (ii) the notion of “non-singularity” itself becomes problematic. For example, the determinant function may no longer test for invertibility.

The proximate effect of these problems is that at some stage M of the Levinson algorithm ²σ^((M)) or ²τ^((M)) may be non-invertible in A. As pointed out previously, in the single-channel situation with scalars in a division ring such as

,

,

this means ²σ^((M))=0 or ²τ^((M))=0, which can be regarded as meaning simply that the channel is highly correlated with its past M values. However, in other cases, such as multi-channel prediction with scalars A=

(K,K,

),

(K,K,

),

(K,K,

), K≧2 the non-invertibility of ²σ^((M)) or ²τ^((M)) is a result of a complex interaction between signals, channels, algebra, and geometry.

Thus, instead of looking for inverses to ²σ^((M)),²τ^((M)), the present invention, according to one embodiment, is based on pseudo-inverses, and, in fact, on the more general theory of compact projections.

According the present invention provides a non-commutative, singular Levinson algorithm, as discussed below. Let A be an hermitian-regular ring and X a left A-module with definite inner product, then by the Projection Theorem (Prop. 7), X admits compact projections so the Levinson parameters exist. For all M≧0, let a₁ ^((M)), . . . , a_(M) ^((M)), (²σ^((M))), b₀ ^((M)), . . . , b_(M−1) ^((M)), (²τ^((M)))εA be Levinson parameters of order M for a toeplitz sequence x₀, x₁, . . . , x_(M), . . . εX.

The constructive form of the Projection Theorem (Prop. VII.3.b) shows how to calculate the forward parameters a₁ ^((M)), . . . , a_(M) ^((M)), (²σ^((M))) inductively in four steps:

(i) Project x₀ onto x₁, . . . , x_(M).

But by definition,

$x_{0} = {\left( {- {\sum\limits_{m = 1}^{M}{a_{m}^{(M)}x_{m}}}} \right) + e^{(M)}}$ is this projection.

(ii) Project x_(M+1) onto x₁, . . . , x_(M).

By definition,

$x_{M} = {\left( {- {\sum\limits_{m = 0}^{M - 1}{b_{m}^{(M)}x_{m}}}} \right) + f^{(M)}}$ is the projection of x_(M) onto x₀, . . . , x_(M−1) but by the

Backshift Lemma,

$x_{M + 1} = {{\left( {- {\sum\limits_{m = 0}^{M - 1}{b_{m}^{(M)}x_{m + 1}}}} \right) + {\overset{\Cup}{f}}^{(M)}} = {\left( {- {\sum\limits_{m = 1}^{M}{b_{m - 1}^{(M)}x_{m}}}} \right) + {\overset{\Cup}{f}}^{(M)}}}$ is a projection of x_(M+1) onto x₁, . . . , x_(M), with ²τ^((M))=²|{hacek over (ƒ)}^((M))|.

(iii) Project e^((M)) onto {hacek over (ƒ)}^((M)) using a pseudo-inverse of ²|{hacek over (ƒ)}^((M))|. It is noted that such a pseudo-inverse exits since ²|{hacek over (ƒ)}^((M))| is hermitian and A is hermitian-regular: e ^((M))=α^((M)){hacek over (ƒ)}^((M)) +ē ^((M)),(ē ^((M))⊥{hacek over (ƒ)}^((M))) α^((M)) =

e ^((M)),{hacek over (ƒ)}^((M))

·²|{hacek over (ƒ)}^((M)) |′=

e ^((M)),{hacek over (ƒ)}^((M))

·(²τ^((M)))′=γ^((M))·(²τ^((M)))′, where γ^((M)) =

e ^((M)),{hacek over (ƒ)}^((M))

.

(iv) Then,

$\begin{matrix} \left\{ \begin{matrix} {\begin{pmatrix} \left( {- a_{1}^{({M + 1})}} \right) \\ \vdots \\ \left( {- a_{M}^{({M + 1})}} \right) \\ \left( {- a_{M + 1}^{({M + 1})}} \right) \end{pmatrix} = {\begin{pmatrix} \left( {- a_{1}^{(M)}} \right) \\ \vdots \\ \left( {- a_{M}^{(M)}} \right) \\ 0 \end{pmatrix} - {\alpha^{(M)} \cdot \begin{pmatrix} \left( {- b_{0}^{(M)}} \right) \\ \vdots \\ \left( {- b_{M - 1}^{(M)}} \right) \\ {- 1} \end{pmatrix}}}} \\ \left. \left( {e^{({M + 1})} = {\overset{\_}{e}}^{(M)}} \right)\Rightarrow\left( {{{}_{}^{}{}_{}^{\left( {M + 1} \right)}} = {\,^{2}{{\overset{\_}{e}}^{(M)}}}} \right) \right. \end{matrix}\Rightarrow \right. \\ \left\{ \begin{matrix} {\begin{pmatrix} a_{0}^{({M + 1})} \\ a_{1}^{({M + 1})} \\ \vdots \\ a_{M}^{({M + 1})} \\ a_{M + 1}^{({M + 1})} \end{pmatrix} = {\begin{pmatrix} a_{0}^{(M)} \\ a_{1}^{(M)} \\ \vdots \\ a_{M}^{(M)} \\ a_{M + 1}^{(M)} \end{pmatrix} - {\alpha^{(M)} \cdot \begin{pmatrix} b_{- 1}^{(M)} \\ b_{0}^{(M)} \\ \vdots \\ b_{M - 1}^{(M)} \\ b_{M}^{(M)} \end{pmatrix}}}} \\ {{{}_{}^{}{}_{}^{\left( {M + 1} \right)}} = {\,^{2}{{\overset{\_}{e}}^{(M)}}}} \end{matrix} \right. \end{matrix}$ by canceling the signs and defining

$\left\{ {\begin{matrix} {a_{0}^{(M)} = {a_{0}^{({M + 1})} = {b_{M}^{(M)} = {b_{M + 1}^{({M + 1})} = 1}}}} \\ {a_{M + 1}^{(M)} = {b_{- 1}^{(M)} = 0}} \end{matrix}.} \right.$

The same basic reasoning can be applied to obtain the backwards parameters of the projection of x_(M+1) onto x₀, . . . , x_((M+1)−1)=x_(M), However, by the Backshift Lemma,

$x_{M + 1} = {{\left( {- {\sum\limits_{m = 0}^{M - 1}{b_{m}^{(M)}x_{m + 1}}}} \right) + {\overset{\Cup}{f}}^{(M)}} = {\left( {- {\sum\limits_{m = 1}^{M}{b_{m - 1}^{(M)}x_{m}}}} \right) + {\overset{\Cup}{f}}^{(M)}}}$ is a projection onto x₁, . . . , x_(M). So the generators x₁, . . . , x_(M) to x₀, x₁, . . . , x_(M) are enlarged:

(i) Project x_(M+1) onto x₁, . . . , x_(M).

By the above,

$x_{M + 1} = {\left( {- {\sum\limits_{m = 1}^{M}{b_{m - 1}^{(M)}x_{m + 1}}}} \right) + {\overset{\Cup}{f}}^{(M)}}$ is this projection.

(ii) Project x₀ onto x₁, . . . , x_(M):

$x_{0} = {\left( {- {\sum\limits_{m = 1}^{M}{a_{m}^{(M)}x_{m}}}} \right) + e^{(M)}}$

(iii) Project {hacek over (ƒ)}^((M)) onto e^((M)) using a pseudo-inverse of ²|e^((M))|: {hacek over (ƒ)}^((M))=β^((M)) e ^((M))+ {hacek over (ƒ)} ^((M)),( {hacek over (ƒ)} ^((M)) ⊥e ^((M))) β^((M))=

{hacek over (ƒ)}^((M)) ,e ^((M))

² ·|e ^((M))|′=

{hacek over (ƒ)}^((M)) ,e ^((M))

·(²τ^((M)))′=(γ^((M)))*·(²τ^((M)))′, where γ^((M)) =

e ^((M)),{hacek over (ƒ)}^((M))

.

(iv) Then

$\left\{ \begin{matrix} {\begin{pmatrix} \left( {- b_{1}^{({M + 1})}} \right) \\ \vdots \\ \left( {- b_{M}^{({M + 1})}} \right) \\ \left( {- b_{0}^{({M + 1})}} \right) \end{pmatrix} = {\begin{pmatrix} \left( {- b_{0}^{(M)}} \right) \\ \vdots \\ \left( {- b_{M - 1}^{(M)}} \right) \\ 0 \end{pmatrix} - {\beta^{(M)} \cdot \begin{pmatrix} \left( {- a_{1}^{(M)}} \right) \\ \vdots \\ \left( {- a_{M}^{(M)}} \right) \\ {- 1} \end{pmatrix}}}} \\ {\left. \left( {f^{({M + 1})} = {\overset{\_}{\overset{\Cup}{f}}}^{(M)}} \right)\Rightarrow\left( {{{}_{}^{}{}_{}^{\left( {M + 1} \right)}} = {\,^{2}{{\overset{\_}{\overset{\Cup}{f}}}^{(M)}}}} \right) \right.\mspace{70mu}} \end{matrix}\Rightarrow\left\{ {\begin{matrix} {\begin{pmatrix} b_{0}^{({M + 1})} \\ b_{1}^{({M + 1})} \\ \vdots \\ b_{M - 1}^{({M + 1})} \\ b_{M + 1}^{({M + 1})} \end{pmatrix} = {\begin{pmatrix} b_{- 1}^{(M)} \\ b_{1}^{(M)} \\ \vdots \\ b_{M - 1}^{(M)} \\ b_{M}^{(M)} \end{pmatrix} - {\beta^{(M)} \cdot \begin{pmatrix} a_{0}^{(M)} \\ a_{1}^{(M)} \\ \vdots \\ a_{M}^{(M)} \\ a_{M + 1}^{(M)} \end{pmatrix}}}} \\ {{{{}_{}^{}{}_{}^{\left( {M + 1} \right)}} = {\,^{2}{{\overset{\_}{\overset{\Cup}{f}}}^{(M)}}}}\mspace{200mu}} \end{matrix},} \right. \right.$ again by canceling the signs and defining

$\left\{ {\begin{matrix} {a_{0}^{(M)} = {a_{0}^{({M + 1})} = {b_{M}^{(M)} = {b_{M + 1}^{({M + 1})} = 1}}}} \\ {{a_{M + 1}^{(M)} = {b_{- 1}^{(M)} = 0}}} \end{matrix}.} \right.$

These equations can be summarized as:

$\left\{ {\begin{matrix} {{{\begin{Bmatrix} {a_{m}^{({M + 1})} = {a_{m}^{(M)} - {\alpha^{(M)} \cdot b_{m - 1}^{(M)}}}} \\ {b_{m}^{({M + 1})} = {b_{m - 1}^{(M)} - {\beta^{(M)} \cdot a_{m}^{(M)}}}} \end{Bmatrix}m} = 0},\ldots\;,{M + 1}} \\ {{{}_{}^{}{}_{}^{\left( {M + 1} \right)}} = {\,^{2}{{\overset{\_}{e}}^{(M)}}}} \\ {{{}_{}^{}{}_{}^{\left( {M + 1} \right)}} = {\,^{2}{{\overset{\_}{\overset{\Cup}{f}}}^{(M)}}}} \end{matrix},{{where}\left\{ {\begin{matrix} \left\{ \begin{matrix} {{e^{(M)} = {{\alpha^{(M)}{\overset{\Cup}{f}}^{(M)}} + {\overset{\_}{e}}^{(M)}}},\left( {{\overset{\_}{e}}^{(M)}\bot{\overset{\Cup}{f}}^{(M)}} \right)} \\ {{\alpha^{(M)} = {\gamma^{(M)}\left( {{}_{}^{}{}_{}^{(M)}} \right)}^{\prime}}\mspace{194mu}} \end{matrix} \right. \\ \left\{ \begin{matrix} {{{\overset{\Cup}{f}}^{(M)} = {{\beta^{(M)}e^{(M)}} + {\overset{\_}{\overset{\Cup}{f}}}^{(M)}}},\left( {{\overset{\_}{\overset{\Cup}{f}}}^{(M)}\bot{\overset{\Cup}{f}}^{(M)}} \right)} \\ {{\beta^{(M)} = {\left( \gamma^{(M)} \right)^{*}\left( {{}_{}^{}{}_{}^{(M)}} \right)^{\prime}}}\mspace{169mu}} \end{matrix} \right. \\ {\gamma^{(M)} = \left\langle {e^{(M)},{\overset{\Cup}{f}}^{(M)}} \right\rangle} \end{matrix}.} \right.}} \right.$

Thus, ē^((M)), {hacek over (ƒ)} ^((M)) can be eliminated by analyzing ²σ^((M+1)),²τ^((M+1)),γ^((M)):

Applying

−,e^((M))

to e^((M))=α^((M)){hacek over (ƒ)}^((M))+ē^((M)) yields:

$\begin{matrix} \begin{matrix} {{{}_{}^{}{}_{}^{(M)}} = {{\,^{2}{e^{(M)}}} = {{\alpha^{(M)}\left\langle {{\overset{\Cup}{f}}^{(M)},e^{(M)}} \right\rangle} + \left\langle {{\overset{\_}{e}}^{(M)},e^{(M)}} \right\rangle}}} \\ {= {{\alpha^{(M)}\left( \gamma^{(M)} \right)}^{*} + \left\langle {e^{({M + 1})},e^{(M)}} \right\rangle}} \end{matrix} & (0.1) \end{matrix}$ since e^((M+1))=ē^((M)) by definition.

Applying

−,e^((M))

to {hacek over (ƒ)}^((M))=β^((M))e^((M))+ {hacek over (ƒ)} ^((M)) yields: (γ^((M)))*=

{hacek over (ƒ)}^((M)) ,e ^((M))

=β^((M)2) |e ^((M))|+

{hacek over (ƒ)} ^((M)) ,e ^((M))

=β^((M)2)σ^((M))  (0.2) since {hacek over (ƒ)} ^((M))⊥e^((M)) by definition of {hacek over (ƒ)} ^((M)).

Applying

e^((M+1)),−

to e^((M))=α^((M)){hacek over (ƒ)}^((M))+ē^((M)) yields:

$\begin{matrix} \begin{matrix} {\left\langle {e^{({M + 1})},e^{(M)}} \right\rangle = {{\alpha^{(M)}\left\langle {e^{({M + 1})},{\overset{\Cup}{f}}^{(M)}} \right\rangle} + \left\langle {e^{({M + 1})},{\overset{\_}{e}}^{(M)}} \right\rangle}} \\ {= {{\,^{2}{e^{({M + 1})}}} = {{}_{}^{}{}_{}^{\left( {M + 1} \right)}}}} \end{matrix} & (0.3) \end{matrix}$ since e^((M+1))=ē^((M)) and ē^((M))⊥{hacek over (ƒ)}^((M)) by definition of ē^((M)).

Substituting (0.1), (0.2) into (0.3) yields: ²σ^((M))=α^((M))β^((M)2)σ^((M))+²σ^((M+1))

²σ^((M+1))=(1−α^((M))β^((M)))·²σ^((M)*)

A similar argument shows ²τ^((M+1))=(1−β^((M))α^((M)))·²τ^((M)).

Now γ^((M))=

e^((M)),{hacek over (ƒ)}^((M))

by definition so using the two projection equations for e^((M)),{hacek over (ƒ)}^((M)) gives

$\gamma^{(M)} = {\left\langle {{\sum\limits_{m = 0}^{M}{a_{m}^{(M)}x_{m}}},{\sum\limits_{k = 0}^{M}{b_{k}^{(M)}x_{k + 1}}}} \right\rangle = {{\sum\limits_{m = 0}^{M}{\sum\limits_{k = 0}^{M}{a_{m}^{(M)}\left\langle {x_{m},x_{k + 1}} \right\rangle b_{k}^{{(M)}^{*}}}}} = {\sum\limits_{m = 0}^{M}{\sum\limits_{k = 0}^{M}{a_{m}^{(M)}R_{k - m + 1}{b_{k}^{{(M)}^{*}}.}}}}}}$

However, the γ Lemma, Lem. 6, implies that this expression can be computed in either of the forms

$\gamma^{(M)} = \left\{ {\begin{matrix} {\sum\limits_{m = 0}^{M}{a_{m}^{(M)}R_{M - m + 1}}} \\ {\sum\limits_{m = 0}^{M}{R_{m + 1}\left( b_{m}^{(M)} \right)}^{*}} \end{matrix};} \right.$ in which the first form can be arbitrarily chosen.

Theorem 1 (The Hermitian-regular Levinson Algorithm) Let A be an hermitian-regular ring and X a left A-module with definite inner product. Let x₀, . . . , x_(M), . . . εX be a toeplitz sequence and R₀, . . . , R_(M), . . . εA its autocorrelation sequence.

Define

$\left\{ {\begin{matrix} {a_{0}^{(0)} = {b_{0}^{(0)} = 1}} \\ {{{}_{}^{}{}_{}^{(0)}} = {{{}_{}^{}{}_{}^{(0)}} = R_{0}}} \end{matrix}.} \right.$

For M≧1, where a₁ ^((M)), . . . , a_(M) ^((M)), ²σ^((M)), b₀ ^((M)), . . . , b_(M−1) ^((M)), ²τ^((M))εA with ²τ^((M)),²σ^((M)) hermitian are given, define

$\left\{ {\begin{matrix} {a_{0}^{(M)} = {b_{M}^{(M)} = {a_{0}^{({M + 1})} = 1}}} \\ {a_{M + 1}^{(M)} = {b_{- 1}^{(M)} = 0}} \end{matrix}{and}\left\{ {\begin{matrix} {\gamma^{(M)} = {\sum\limits_{m = 0}^{M}\;{a_{m}^{(M)}R_{M - m + 1}}}} \\ {\alpha^{(M)} = {\gamma^{(M)} \cdot \left( {{}_{}^{}{}_{}^{(M)}} \right)^{\prime}}} \\ {\beta^{(M)} = {\gamma^{{(M)}*} \cdot \left( {{}_{}^{}{}_{}^{(M)}} \right)^{\prime}}} \end{matrix},} \right.} \right.$ where (−)′ denotes a pseudo-inverse.

Finally, define

$\left\{ {\begin{matrix} {{{\begin{Bmatrix} {a_{m}^{({M + 1})} = {a_{m}^{(M)} - {\alpha^{(M)} \cdot b_{m - 1}^{(M)}}}} \\ {b_{m}^{({M + 1})} = {b_{m - 1}^{(M)} - {\beta^{(M)} \cdot a_{m}^{(M)}}}} \end{Bmatrix}\mspace{25mu} m} = 0},\;\ldots\mspace{11mu},{M + 1}} \\ \left\{ \begin{matrix} {{{}_{}^{}{}_{}^{\left( {M + 1} \right)}} = {\left( {1 - {\alpha^{(M)}\beta^{(M)}}} \right)^{2}\sigma^{(M)}}} \\ {{{}_{}^{}{}_{}^{\left( {M + 1} \right)}} = {\left( {1 - {\beta^{(M)}\alpha^{(M)}}} \right)^{2}\tau^{(M)}}} \end{matrix} \right. \end{matrix}.} \right.$ Then for all M≧0, a₁ ^((M)), . . . , a_(M) ^((M)), ²σ^((M)), b₀ ^((M)), . . . , b_(M−1) ^((M)), ²τ^((M)) are Levinson parameters for x₀, . . . , x_(M), . . . .

It is noted that unlike non-singular forms of the algorithm, the residuals for singularity need not be tested and the increasing of the order M need not be stopped. Of course, in practice, the residuals is examined. For example, if ²σ^((M))=²τ^((M))=0 then at any order N>M, thus the following can be chosen:

$\quad\left\{ \begin{matrix} {{a_{m}^{(N)} = a_{m}^{(M)}},{m \leq M}} \\ {{a_{m}^{(N)} = 0},{m > M}} \\ {{{}_{}^{}{}_{}^{(N)}} = 0} \end{matrix} \right.$ and similarly for the backwards parameters.

More generally, if the eigenstructure of the residuals can be calculated then the dimensions of A and X can be reduced for later stages by passing to principal axes corresponding to invertible eigenvalues. However, there are tremendous conceptual and practical advantages to this approach because these reductions are not required.

In addressing the special cases of the Hermitian-singular Levinson Algorithm, the following corollary results:

Corollary 6 Let A be a symmetric algebra and x₀, . . . , x_(M), . . . εX a toeplitz sequence in a left A-module X with definite inner product.

(i) Then the Levinson algorithm applies and, moreover, for every M≧0, the following can be chosen:

$\left\{ {\begin{matrix} {\beta^{(M)} = \left( \alpha^{(M)} \right)^{*}} \\ {{{}_{}^{}{}_{}^{(M)}} = {{}_{}^{}{}_{}^{(M)}}} \end{matrix}.} \right.$

(ii) If, in addition, A is commutative, then the following can be chosen: b _(m) ^((M))=(a _(M−m) ^((M)))*,m=0, . . . ,M.

Thus, in this case, the backwards parameters do not need to be independently computed.

Cor. 6.i applies, for example, to single-channel prediction over

and Cor. 6.ii to single-channel prediction over

.

With respect to multi-channel four-dimensional Linear Prediction Theorem, Corollary 7 is stated.

Corollary 7 The Levinson algorithm applies to any

(K,K,D)-module X with definite inner product for D=

,

,

. In particular, the algorithm applies to any X=

(K,L,D) with inner product

x,y

=xy*.

Returning to the problem of modeling space curves, the present invention regards it as axiomatic that the points of a space curve must have a scale attached to them, a scale which may vary along the curve. This is because a space curve may wander globally throughout a spatial manifold.

There are several ways of extending a space curve

I ⁢ -> X _ ⁢ 3 to homogeneous coordinates

I ⁢ -> X _ ⁢ 3 × . One approach is to ignore the scale entirely by setting the scale coordinate σ=0. Another natural choice is have a uniform scale σ=1. However, it can be noted that these constant scales do not remain constant as 4-dimensional processing proceeds. As a result, there needs to be a good geometric interpretation for these scale changes.

The two major models used are characterized as either timelike or spacelike. The timelike model uses homogeneous coordinates (Δx,Δy,Δz,Δt). For data sampled at a uniform rate, Δt=constant so this is the uniform model above. However, there is no requirement of uniform sampling. It is noted that over the length of the curve, these homogeneous vectors can be added, maintaining a clear geometric interpretation:

${\sum\limits_{i}\left( {{\Delta\; x_{i}},{\Delta\; y_{i}},{\Delta\; z_{i}},{\Delta\; t_{i}}} \right)} = {\left( {{\Delta\; x_{total}},{\Delta\; y_{total}},{\Delta\; z_{total}},{\Delta\; t_{total}}} \right).}$

This is in distinction to the “velocities,” which are the projective versions of the homogeneous points:

${\overset{\rightarrow}{v}}_{i} = \left( {\frac{\Delta\; x_{i}}{\Delta\; t_{i}},\frac{\Delta\; y_{i}}{\Delta\; t_{i}},\frac{\Delta\; z_{i}}{\Delta\; t_{i}}} \right)$ which cannot be added along the curve without the scale Δt_(i).

The spacelike model uses the arc length Δs=√{square root over ((Δx)²+(Δy)²+(Δz)²)}{square root over ((Δx)²+(Δy)²+(Δz)²)}{square root over ((Δx)²+(Δy)²+(Δz)²)} as the scale. As with time the homogeneous coordinates are vectorial:

${\sum\limits_{i}\left( {{\Delta\; x_{i}},{\Delta\; y_{i}},{\Delta\; z_{i}},{\Delta\; s_{i}}} \right)} = {\left( {{\Delta\; x_{total}},{\Delta\; y_{total}},{\Delta\; z_{total}},{\Delta\; s_{total}}} \right).}$

The corresponding projective construct is the unit tangent vector:

$\hat{T} = {\left( {\frac{\Delta\; x}{\Delta\; s},\frac{\Delta\; y}{\Delta\; s},\frac{\Delta\; z}{\Delta\; s}} \right).}$

It is noted that

${\hat{T}}^{2} = {\frac{{\Delta\; x^{2}} + {\Delta\; y^{2}} + {\Delta\; z^{2}}}{\Delta\; s^{2}} = 1.}$

{circumflex over (T)} is (approximately) tangent to the space curve at the given point; i.e., parallel to the velocity {right arrow over (ν)}. However, unlike {right arrow over (ν)}, {circumflex over (T)} is always of length 1 so all information concerning the speed

$v = \frac{\Delta\; s}{\Delta\; t}$ of traversal of the curve is absent. In relativistic terms, the spacelike model is locally simultaneous.

Rather than a fault, the time-independence of the spacelike coordinates (Δx,Δy,Δz,Δs) is precisely the desired characteristic in certain situations, especially in gait modeling. For example, it is well-known from speech analysis that a single speaker does not speak the same phonemes at the same rates in different contexts. This is referred to as “time warping” and is a major difficulty in applying ordinary frequency-based modeling, which assume a constant rate of time flow, to speech. There are many semi-heuristic algorithms which have been developed to unwarp time in speech analysis. It is to be expected that the same phenomenon will occur in gait analysis not only because of differences in walking contexts, but simply because people do not behave uniformly even in uniform situations.

The concept “rate of time flow”, which is sometimes presented as meaningless, can actually be made quite precise. It simply means measuring time increments with respect to some other sequence of events. In the spacelike model, the measure of the rate of time flow is precisely

$\frac{\Delta\; t}{\Delta\; s}.$ This means that time is measured not by the clock but by how much distance is covered; i.e., purely by the “shape” of the space track. Time gets “warped” because the same distance may be traversed in different amounts of time. However, this effect is completely eliminated by use of spacelike coordinates.

For optics, the scale parameter for spacelike modeling is optical path length. It is this length which is meant when the statement is made that “light takes the shortest path between two points”. It is noted that the optical path is by no means straight in

³: its curvature is governed by the local index of refraction and the frequencies of the incident light.

Spatial time series are almost always presented as absolute positions (x_(i),y_(i),z_(i)) or increments (Δx_(i),Δy_(i),Δz_(i)). There are rare experimental situations in which spatial velocities

$\left( {\left( \frac{\mathbb{d}x}{\mathbb{d}t} \right)_{i},\left( \frac{\mathbb{d}y}{\mathbb{d}t} \right)_{i},\left( \frac{\mathbb{d}z}{\mathbb{d}t} \right)_{i}} \right)$ are directly measured. Remarkably, however, color vision entails the direct measurement of time rates-of-change. Each pixel on a time-varying image such as a video can be seen as a space curve moving through one of the three-dimensional vector space color systems, such as RGB, the C.I.E. XYZ system, television's Y/UV system, and so forth, all of which are linear transformations of one another. Thus, as vector spaces, these systems are just

³.

The human retina contains four types of light receptors; namely, 3 types of cones, called L,M, and S, and one type of rod. Rods specialize in responding accurately to single photons but saturate at anything above very low light levels. Rod vision is termed “scotopic” and because it is only used for very dim light and cannot distinguish colors, it can be ignored for our purposes. The cones, however, work at any level above low light up to extremely bright light such as the sun on snow. Moreover, it is the cones which distinguish colors. Cone vision is called “photopic” and so the color system presented herein is denoted “photopic coordinates.”

Each photoreceptor contains a photon-absorbing chemical called rhodopsin containing a component which photoisomerizes (i.e., changes shape) when it absorbs a photon. The rhodopsins in each of the receptor types have slightly different protein structures causing them to have selective frequency sensitivities.

Essentially, the L cones are the red receptors, the M cones the green receptors, and the S cones the blue receptors, although this is a loose classification. All the cones respond to all visible frequencies. This is especially pronounced in the L/M system whose frequency separation is quite small. Yet it is sufficient to separate red from green and, in fact, the most common type of color-blindness is precisely this red-green type in which the M cones fail to function properly.

It is noted that it is the number of photoisomerizations that matter. These are considerably fewer than the number of photons which reach the cone. Luminous efficiency is concerned with what one does see, not what one might see. It takes about three photoisomerizations to cause the cone to signal and it takes about 50 ms for the rhodopsin molecule to regenerate itself after photon absorption. So, generally, if the photoisomerization rate is anything above 60 photoisomerizations/sec, then the cone's response is continuous and additive. That is, the higher the photoisomerization rate at a given frequency, the larger is the cone's signal to the brain.

So the physiological three-dimensional color system is the LMS system, in which the coordinate values are the total photoisomerization rate of each of the cone types. All the other coordinate systems are implicitly derived from this one.

Since the LMS values are time rates, the homogeneous coordinates corresponding to the color (L_(i),M_(i),S_(i)) are (L_(i)·Δt_(i),M_(i)·Δt_(i),S_(i)·Δt_(i),Δt_(i)). It is noted that L_(i)·Δt_(i) equals the total number of photoisomerizations that occurred during the time interval t_(i) to t_(i)+Δt_(i) and similarly for the other coordinates. The homogeneous coordinates (l,m,s,t), where l is the number of photosomerizations of the L-system, m of the M-system, s of the S-system, and t the time, is called photopic coordinates.

Since there are various well-known approximate transformations from the standard RGB or XYZ systems to LMS, the photopic coordinate increments can be calculated: (Δl _(i) ,Δm _(i) ,Δs _(i) ,Δt _(i))=(L _(i) ·Δt _(i) ,M _(i) ·Δt _(i) ,S _(i) ·Δt _(i) ,Δt _(i)) along pixel color curve specified in any system.

The photopic coordinates (Δl,Δm,Δs,Δt) correspond to what is referred to as timelike coordinates for space curves. There are spacelike versions (Δl,Δm,Δs,Δκ) where Δκ is a photometric length of the photoisomerization interval (Δl,Δm,Δs). However, Δκ is much more complicated to define than the simple Pythagorean length √{square root over ((Δl)²+(Δm)²+(Δs)²)}{square root over ((Δl)²+(Δm)²+(Δs)²)}{square root over ((Δl)²+(Δm)²+(Δs)²)}.

Applying the Fundamental Theorem Prop. 3 to n=1 implies that any quaternion q can be written in the form q=uλu* with uε

and λε

. Thus, q=u(Re(λ)+Im(λ)I)u*=Re(λ)+Im(λ)(uIu*) so Sc(q)=Re(λ) and Vc(q) is the rotation of Im(λ)I determined by u.

However, by Prop. 4, u is not unique and this can also been seen from the basic geometry because there is not a unique rotation sending Im(λ)I to Vc(q).

However, if Im(λ)I is required to move in the most direct way possible; i.e., along a great circle, then this rotation is unique and defines an external uε

, unique up to sign. This can be denoted as the polar representation of a quaternion because it is directly related to the representation of Vc(q) in polar coordinates.

Let q=a+bI+cJ+dK=a+{right arrow over (ν)}. λ is an eigenvalue of

${〚\; q\; 〛} = \begin{pmatrix} {a + {b\; i}} & {c + {d\; i}} \\ {{- c} + {d\; i}} & {a - {b\; i}} \end{pmatrix}$ with characteristic polynomial p(x)=x²−2ax+|q|² and whose roots are a±νi, where ν=|{right arrow over (ν)}|=√{square root over (b²+c²+d²)} such that λ=a+νi is chosen.

Assuming c²+d²≠0, the unit vector

$\hat{\alpha} = \frac{{{- d}\; J} + {c\; K}}{\sqrt{c^{2} + d^{2}}}$ is such that {circumflex over (α)},I,{right arrow over (ν)} is a right-hand orthogonal system. So {right arrow over (ν)} is obtained from νI by right-hand rotation around {circumflex over (α)} by an angle φ. Clearly

${\cos(\varphi)} = \frac{b}{v}$ if b²+c²+d²≠0 and 0≦φ≦π. Since then

${0 \leq \frac{\varphi}{2} \leq \frac{\varphi}{2}},$

$\begin{matrix} {{\cos\left( \frac{\varphi}{2} \right)} = {\sqrt{\frac{1 + {\cos(\varphi)}}{2}} = \sqrt{\frac{v + b}{2\; v}}}} \\ {{\sin\left( \frac{\varphi}{2} \right)} = {\sqrt{\frac{1 - {\cos(\varphi)}}{2}} = \sqrt{\frac{v - b}{2\; v}}}} \end{matrix}$ and therefore

$u = {{{\cos\left( \frac{\varphi}{2} \right)} + {{\sin\left( \frac{\varphi}{2} \right)}\hat{\alpha}}} = {\frac{1}{\sqrt{2v}}{\left( {\sqrt{v + b} + {\left( \sqrt{v - b} \right)\hat{\alpha}}} \right).}}}$

So long as {right arrow over (ν)}≠{right arrow over (0)} singularities in this formula can be removed. However, there is an unremovable singularity at {right arrow over (ν)}={right arrow over (0)} whose behavior is analogous to the unremovable singularity at z=0 of

${{sgn}(z)} = \frac{z}{z}$ for zε

.

The present invention, according to one embodiment, represents quaternions in polar form; that is, a quaternion q, representing a three- or four-dimensional data point, is decomposed into the polar form q=uλu*, then the pair uε

,λε

are processed independently.

In particular, it is noted that the eigenvalues λ are in the commutative field

so that the simplifications of linear prediction which result from the commutativity, such as Cor.6.ii, apply to these values.

In this way, for example, a discrete spacetime path (αx_(n),Δy_(n),Δz_(n),Δt_(n)), nε

in

⁴ is first transformed into the quaternion path (Δt_(n)+Δx_(n)I+Δy_(n)J+Δz_(n)K, nε

) and then into the pair of paths (u_(n)ε

, nε

) and (λ_(n)ε

, nε

) for which separate linear prediction structures are determined.

These structures may either be combined or treated as separate parameters depending upon the application.

The modules that are of concern for the present invention are derived from measurable functions of the form:

×Ω

X, where X is an A-module with a definite inner product,

is some time parameter space (usually

or

), and Ω is a probability space with probability measure P. Thus Ψ is a stochastic process.

However, this definition also includes the deterministic case by setting Ω={*}, the 1-point space, and P(Ø)=0, P(Ω)=1.

Viewed as a function of the random outcomes ωεΩ, Ψ:Ω→X^(T) is regarded as a random path in X; i.e., Ψ induces a probability measure P_(Ψ) on the set of all paths {x(t):

→X}. In the deterministic case, the image of Ψ:Ω→X^(T) is just the single path x_(*)(t)=Ψ(t,*)εX and P_(Ψ) is concentrated at

${x_{*}\text{:}\mspace{14mu}{P_{\Psi}(E)}} = \left\{ {\begin{matrix} {1,{{{if}\mspace{14mu} x_{*}} \in E}} \\ {0,{{{if}\mspace{14mu} x_{*}} \notin E}} \end{matrix}.} \right.$

On the other hand, viewed as a function of the time parameter tε

, Ψ:

→X^(Ω) is regarded as a path of random elements of X: for every tε

, the value x(t) is an X-valued random variable ω

x(t)(ω)=Ψ(t,ω). In the deterministic case, x(t)=x_(*)(t) as defined above.

For example, given a random sample ω₁, . . . , ω_(N)εΩ, the resulting sampled paths can be viewed in two ways:

-   -   (i) As N randomly chosen paths x₁, . . . x_(N):         →X, defined by ((∀tε         )x_(ν)(t)=Ψ(t,ω_(ν))), ν=1, . . . , N     -   (ii) As a single path x:         →X^(N) defined by ((∀tε         )x(t)=         Ψ(t,ω₁), . . . , Ψ(t,ω_(N))         ) where, for each tε         , the list         Ψ(t,ω₁), . . . , Ψ(t,ω_(N))         εX^(N) is viewed as a random sample from X.

A conventional real-valued random signal s:

→

would be viewed as a path through the one-dimensional

-module X=

, with time parameter tε

.

It is important to note that a signal is really a (random or deterministic) path through some A-module with a definite inner product. The special case of this construction of interest is when the scalars A form a real or complex Banach space. With respect to Banach spaces, it is observed that many measurable functions ƒ:(Ξ,μ)→

, where (Ξ,μ) is a measure space and

is a Banach space, can be integrated

${\int\limits_{\Xi}{f\;{\mathbb{d}\mu}}} \in B$ and that this integral possesses the usual properties. When (Ω,P) is a probability space, this can be interpreted as the average or expected value

ɛ[f] = ∫_(Ω) f 𝕕P ∈ B.

For example, the matrix algebras M(n,n,D), D=

,

,

can be shown to be Banach spaces with their standard inner products.

Then any two random paths

${\times \Omega}\overset{\Psi,\Phi}{\rightarrow}X$ define a function

${{\times \Omega}\overset{({\Psi,\Phi})}{\rightarrow}}:$ (t,ω)

Ψ(t,ω),Φ(t,ω)

. In particular, any random path

${\times \Omega}\overset{\Psi}{\rightarrow}X$ defines

×Ω

B:(t,ω)

²|Ψ(t,ω)|.

Such functions can be averaged in two different ways: (1) with respect tε

, and (2) with respect to ωεΩ, or vice versa.

From the first perspective, for every ωεΩ, the following is formed:

${{value}\mspace{14mu}{\lim\limits_{T\rightarrow\infty}{\frac{1}{2T}{\int_{T}^{T}{{\,^{2}{{\Psi\left( {t,\omega} \right)}}}\ {\mathbb{d}t}}}}}} \in {B\left( {{or}\mspace{14mu}{\lim\limits_{N\rightarrow\infty}{\frac{1}{2N}{\sum\limits_{n = {- N}}^{N}\;{{\,^{2}{{\Psi\left( {n,\omega} \right)}}}\mspace{14mu}{when}\mspace{14mu} T\mspace{14mu}{is}\mspace{14mu}{discrete}}}}}} \right)}$ and then the function sending

$\left. \omega\mapsto{\lim\limits_{T\rightarrow\infty}{\frac{1}{2T}{\int_{T}^{T}{{\,^{2}{{\Psi\left( {t,\omega} \right)}}}\ {\mathbb{d}t}}}}} \right. \in B$ is a

-valued random variable on the probability space (Ω,P). As such, the expected value is formed:

${ɛ\left\lbrack {\lim\limits_{T\rightarrow\infty}{\frac{1}{2T}{\int_{- T}^{T}{{\,^{2}{{\Psi\left( {t,\omega} \right)}}}\ {\mathbb{d}t}}}}} \right\rbrack} \in {B.}$

Alternatively, for every tε

, the expected value ε[Ψ(t,ω)]ε

which, for 0-mean paths, is the variance at tε

can first be found, and then averaging these variances to form

${\lim\limits_{T\rightarrow\infty}{\frac{1}{2T}{\int_{- T}^{T}{{ɛ\left\lbrack {\,^{2}{{\Psi\left( {t,\omega} \right)}}} \right\rbrack}\ {\mathbb{d}t}}}}} \in {B.}$

Either of these double integrals may be regarded as the expected total power ²|Ψ| of the path and the only assumption that needs to be made concerning the interrelation between the probability and the geometry is that one or the other of these integrals is finite.

When this obtains, it can be shown that the two different methods of calculating this average coincide as in the Fubini Theorem:

${\,^{2}{\Psi }}\  = {{ɛ\left\lbrack {\lim\limits_{T\rightarrow\infty}{\frac{1}{2T}{\int_{- T}^{T}{{\,^{2}{{\Psi\left( {t,\omega} \right)}}}\ {\mathbb{d}t}}}}} \right\rbrack} = {\lim\limits_{T\rightarrow\infty}{\frac{1}{2T}{\int_{- T}^{T}{{ɛ\left\lbrack {\Psi\left( {t,\omega} \right)} \right\rbrack}\ {{\mathbb{d}t}.}}}}}}$

When

${\times \Omega}\overset{\Psi,\Phi}{\rightarrow}X$ are two such paths, then their inner product can be defined as

$\begin{matrix} {\left\langle {\Psi,\Phi} \right\rangle = {{\lim\limits_{T\rightarrow\infty}{\frac{1}{2T}{\int_{- T}^{T}{{ɛ\left\lbrack \left\langle {{\Psi\left( {t,\omega} \right)},{\Phi\left( {t,\omega} \right)}} \right\rangle \right\rbrack}\ {\mathbb{d}t}}}}} \in B}} \\ {{{and}\mspace{20mu}\left\langle {\Psi,\Phi} \right\rangle} = {{ɛ\left\lbrack {\lim\limits_{T\rightarrow\infty}{\frac{1}{2T}{\int_{- T}^{T}{\left\langle {{\Psi\left( {t,\omega} \right)},{\Phi\left( {t,\omega} \right)}} \right\rangle\ {\mathbb{d}t}}}}} \right\rbrack}.}} \end{matrix}$

This inner product becomes definite by identifying paths Ψ,Φ for which ²|Ψ−Φ|=0 in the usual manner; i.e., by considering equivalence classes of paths rather than the paths themselves.

The result is a well-defined path space

(X,Ω,P) which is a

-module with definite inner product determined by both the geometry of the

-module X and probability space (Ω,P).

Attention is now drawn to linear prediction on

(X,Ω,P). Let

${\times \Omega}\overset{\Psi}{\rightarrow}X$ be a path where

is discrete (or continuous but sampled at time increments Δt_(i)), then Ψ defines the sequence Ψ₀, Ψ₁, . . . , Ψ_(M), . . . ε

(X,Ω,P) of its past values Ψ_(m)(n,ω)=Ψ(n−m,ω).

This sequence is toeplitz since

$\begin{matrix} {\left\langle {\Psi_{k},\Psi_{m}} \right\rangle = {\lim\limits_{N\rightarrow\infty}{\frac{1}{2N}{\sum\limits_{n = {- N}}^{N}{ɛ\left\lbrack \left\langle {{\Psi_{k}\left( {n,\omega} \right)},{\Psi_{m}\left( {n,\omega} \right)}} \right\rangle \right\rbrack}}}}} \\ {= {\lim\limits_{N\rightarrow\infty}{\frac{1}{2N}{\sum\limits_{n = {- N}}^{N}{ɛ\left\lbrack \left\langle {{\Psi\left( {{n - k},\omega} \right)},{\Psi\left( {{n - m},\omega} \right)}} \right\rangle \right\rbrack}}}}} \\ {= {\lim\limits_{N\rightarrow\infty}{\frac{1}{2N}{\sum\limits_{n = {- N}}^{N}{ɛ\left\lbrack \left\langle {{\Psi\left( {n,\omega} \right)},{\Psi\left( {{n - \left( {m - k} \right)},\omega} \right)}} \right\rangle \right\rbrack}}}}} \end{matrix}$ depends only on the difference m−k.

Thus, the modified Levinson algorithm, as detailed above, can be applied to the toeplitz sequence Ψ₀, Ψ₁, . . . , Ψ_(M), . . . ε

(X,Ω,P) to produce the Levinson parameters

$\quad\left\{ \begin{matrix} {{\Psi_{0} = {{- {\sum\limits_{m = 1}^{M}\;{a_{m}^{(M)}\Psi_{m}}}} + {\mathbb{e}}^{(M)}}},{{\mathbb{e}}^{(M)}\bot\Psi_{1}},\ldots\;,\Psi_{M}} \\ {{\Psi_{M} = {{- {\sum\limits_{m = 0}^{M - 1}\;{b_{m}^{(M)}\Psi_{m}}}} + f^{(M)}}},{f^{(M)}\bot\Psi_{0}},\ldots\;,\Psi_{M - 1},} \\ {a_{1}^{(M)},\ldots\;,a_{M}^{(M)},b_{0}^{(M)},\ldots\mspace{11mu},\;{b_{M - 1}^{(M)} \in A},{\mathbb{e}}^{(M)},{f^{(M)} \in {P\left( {X,\Omega,P} \right)}}} \end{matrix} \right.$

Of course,

(X,Ω,P) is usually infinite-dimensional. However, when A is hermitian regular, as with M(n,n,D), D=

,

,

, the Levinson algorithm applies without any changes.

The modified Levinson algorithm can be computed using any computing system, as that described in FIG. 5.

FIG. 5 illustrates a computer system 500 upon which an embodiment according to the present invention can be implemented. The computer system 500 includes a bus 501 or other communication mechanism for communicating information and a processor 503 coupled to the bus 501 for processing information. The computer system 500 also includes main memory 505, such as a random access memory (RAM) or other dynamic storage device, coupled to the bus 501 for storing information and instructions to be executed by the processor 503. Main memory 505 can also be used for storing temporary variables or other intermediate information during execution of instructions by the processor 503. The computer system 500 may further include a read only memory (ROM) 507 or other static storage device coupled to the bus 501 for storing static information and instructions for the processor 503. A storage device 509, such as a magnetic disk or optical disk, is coupled to the bus 501 for persistently storing information and instructions.

The computer system 500 may be coupled via the bus 501 to a display 511, such as a cathode ray tube (CRT), liquid crystal display, active matrix display, or plasma display, for displaying information to a computer user. An input device 513, such as a keyboard including alphanumeric and other keys, is coupled to the bus 501 for communicating information and command selections to the processor 503. Another type of user input device is a cursor control 515, such as a mouse, a trackball, or cursor direction keys, for communicating direction information and command selections to the processor 503 and for controlling cursor movement on the display 511.

According to one embodiment of the invention, the process of FIG. 3 is provided by the computer system 500 in response to the processor 503 executing an arrangement of instructions contained in main memory 505. Such instructions can be read into main memory 505 from another computer-readable medium, such as the storage device 509. Execution of the arrangement of instructions contained in main memory 505 causes the processor 503 to perform the process steps described herein. One or more processors in a multi-processing arrangement may also be employed to execute the instructions contained in main memory 505. In alternative embodiments, hard-wired circuitry may be used in place of or in combination with software instructions to implement the embodiment of the present invention. Thus, embodiments of the present invention are not limited to any specific combination of hardware circuitry and software.

The computer system 500 also includes a communication interface 517 coupled to bus 501. The communication interface 517 provides a two-way data communication coupling to a network link 519 connected to a local network 521. For example, the communication interface 517 may be a digital subscriber line (DSL) card or modem, an integrated services digital network (ISDN) card, a cable modem, a telephone modem, or any other communication interface to provide a data communication connection to a corresponding type of communication line. As another example, communication interface 517 may be a local area network (LAN) card (e.g. for Ethernet™ or an Asynchronous Transfer Model (ATM) network) to provide a data communication connection to a compatible LAN. Wireless links can also be implemented. In any such implementation, communication interface 517 sends and receives electrical, electromagnetic, or optical signals that carry digital data streams representing various types of information. Further, the communication interface 517 can include peripheral interface devices, such as a Universal Serial Bus (USB) interface, a PCMCIA (Personal Computer Memory Card International Association) interface, etc. Although a single communication interface 517 is depicted in FIG. 5, multiple communication interfaces can also be employed.

The network link 519 typically provides data communication through one or more networks to other data devices. For example, the network link 519 may provide a connection through local network 521 to a host computer 523, which has connectivity to a network 525 (e.g. a wide area network (WAN) or the global packet data communication network now commonly referred to as the “Internet”) or to data equipment operated by a service provider. The local network 521 and network 525 both use electrical, electromagnetic, or optical signals to convey information and instructions. The signals through the various networks and the signals on network link 519 and through communication interface 517, which communicate digital data with computer system 500, are exemplary forms of carrier waves bearing the information and instructions.

The computer system 500 can send messages and receive data, including program code, through the network(s), network link 519, and communication interface 517. In the Internet example, a server (not shown) might transmit requested code belonging an application program for implementing an embodiment of the present invention through the network 525, local network 521 and communication interface 517. The processor 503 may execute the transmitted code while being received and/or store the code in storage device 59, or other non-volatile storage for later execution. In this manner, computer system 500 may obtain application code in the form of a carrier wave.

The term “computer-readable medium” as used herein refers to any medium that participates in providing instructions to the processor 505 for execution. Such a medium may take many forms, including but not limited to non-volatile media, volatile media, and transmission media. Non-volatile media include, for example, optical or magnetic disks, such as storage device 509. Volatile media include dynamic memory, such as main memory 505. Transmission media include coaxial cables, copper wire and fiber optics, including the wires that comprise bus 501. Transmission media can also take the form of acoustic, optical, or electromagnetic waves, such as those generated during radio frequency (RF) and infrared (IR) data communications. Common forms of computer-readable media include, for example, a floppy disk, a flexible disk, hard disk, magnetic tape, any other magnetic medium, a CD-ROM, CDRW, DVD, any other optical medium, punch cards, paper tape, optical mark sheets, any other physical medium with patterns of holes or other optically recognizable indicia, a RAM, a PROM, and EPROM, a FLASH-EPROM, any other memory chip or cartridge, a carrier wave, or any other medium from which a computer can read.

Various forms of computer-readable media may be involved in providing instructions to a processor for execution. For example, the instructions for carrying out at least part of the present invention may initially be borne on a magnetic disk of a remote computer. In such a scenario, the remote computer loads the instructions into main memory and sends the instructions over a telephone line using a modem. A modem of a local computer system receives the data on the telephone line and uses an infrared transmitter to convert the data to an infrared signal and transmit the infrared signal to a portable computing device, such as a personal digital assistant (PDA) or a laptop. An infrared detector on the portable computing device receives the information and instructions borne by the infrared signal and places the data on a bus. The bus conveys the data to main memory, from which a processor retrieves and executes the instructions. The instructions received by main memory can optionally be stored on storage device either before or after execution by processor.

Accordingly, the present invention provides an approach for performing signal processing. Multi-dimensional data (e.g., three- and four-dimensional data) can be represented as quaternions. These quaternions can be employed in conjunction with a linear predictive coding scheme that handles autocorrelation matrices that are not invertible and in which the underlying arithmetic is not commutative. The above approach advantageously avoids the time-warping and extends linear prediction techniques to a wide class of signal sources.

While the present invention has been described in connection with a number of embodiments and implementations, the present invention is not so limited but covers various obvious modifications and equivalent arrangements, which fall within the purview of the appended claims. 

1. A method for providing linear prediction, the method comprising: collecting multi-channel data from a plurality of independent sources; representing the multi-channel data as vectors of quaternions; generating an autocorrelation matrix corresponding to the quaternions; and outputting linear prediction coefficients based upon the autocorrelation matrix, wherein the linear prediction coefficients represent a compression of the collected multi-channel data.
 2. A method according to claim 1, wherein the data in the representing step includes at least one of 3-dimensional data and 4-dimensional data.
 3. A method according to claim 1, wherein the multi-channel data represents one of video signals, and voice signals.
 4. A method for supporting video compression, the method comprising: collecting time series video signals as multi-channel data, wherein the multi-channel data is represented as vectors of quaternions; generating an autocorrelation matrix corresponding to the quaternions; and outputting linear prediction coefficients based upon the autocorrelation matrix.
 5. A method according to claim 4, further comprising: transmitting the linear prediction coefficients over a data network to a remote video display for displaying images represented by the video signals that are generated from the transmitted linear prediction coefficients.
 6. A method of signal processing, the method comprising: receiving multi-channel data; representing multi-channel data as vectors of quaternions; and performing linear prediction based on the quaternions.
 7. A method according to claim 6, further comprising: outputting an autocorrelation matrix corresponding to the quaternions, wherein the linear prediction is performed based on the autocorrelation matrix.
 8. A method according to claim 6, wherein the data in the representing step includes at least one of 3-dimensional data and 4-dimensional data.
 9. A method according to claim 6, wherein the multi-channel data represents one of video signals, and voice signals.
 10. A method of performing linear prediction, the method comprising: representing multi-channel data as a pseudo-invertible matrix; generating a pseudo-inverse of the matrix; and outputting a plurality of linear prediction weight values and associated residual values based on the generating step.
 11. A method according to claim 10, wherein the multi-channel data is represented as a vector of quaternions.
 12. A method according to claim 10, further comprising: computing Levinson parameters corresponding to the matrix, wherein the plurality of linear prediction weight values and associated residual values is based on the computed Levinson parameters.
 13. A method according to claim 10, wherein the matrix has scalars that are non-commutative.
 14. A method according to claim 10, wherein the multi-channel data is represented as elements of a random path module.
 15. A computer-readable medium carrying one or more sequences of one or more instructions for performing signal processing, the one or more sequences of one or more instructions including instructions which, when executed by one or more processors, cause the one or more processors to perform the steps of: receiving multi-channel data; representing multi-channel data as vectors of quaternions; and performing linear prediction based on the quaternions.
 16. A computer-readable medium according to claim 15, wherein the one or more processors further perform the step of: outputting an autocorrelation matrix corresponding to the quaternions, wherein the linear prediction is performed based on the autocorrelation matrix.
 17. A computer-readable medium according to claim 15, wherein the data in the representing step includes at least one of 3-dimensional data and 4-dimensional data.
 18. A computer-readable medium according to claim 15, wherein the multi-channel data represents one of video signals, and voice signals.
 19. A computer-readable medium carrying one or more sequences of one or more instructions for performing linear prediction, the one or more sequences of one or more instructions including instructions which, when executed by one or more processors, cause the one or more processors to perform the steps of: representing multi-channel data as a pseudo-invertible matrix; generating a pseudo-inverse of the matrix; and outputting a plurality of linear prediction weight values and associated residual values based on the generating step.
 20. A computer-readable medium according to claim 19, wherein the multi-channel data is represented as a vector of quaternions.
 21. A computer-readable medium according to claim 19, wherein the one or more processors further perform the step of: computing Levinson parameters corresponding to the matrix, wherein the plurality of linear prediction weight values and associated residual values is based on the computed Levinson parameters.
 22. A computer-readable medium according to claim 19, wherein the matrix has scalars that are non-commutative.
 23. A computer-readable medium according to claim 19, wherein the multi-channel data is represented as elements of a random path module. 